THE ANNALS 
of 
MATHEMATICAL 
STATISTICS 


THe OFFICIAL JOURNAL OF THE INSTITUTE 
OF MATHEMATICAL STATISTICS 


Contents 


PAGE 


Tests of Statistical Hypotheses which are Unbiased in the Limit. 


The Transformation of Statistics to Simplify their Distribution. 


Haroup Horexiine AND LESTER R. FRANKEL 


On Combined Expansions of Products of Symmetric Power Sums 
and of Sums of Symmetric Power Products with Applications 
to Sampling (Continued). Pau. S. Dwyer 


Distributions of Sums of Squares of Rank Differences for Small 
Numbers of Individuals. E.G. OLps 


Note on Correlations. D. DeLury 


Vol. IX, No. 2 — June, 1938 





THE ANNALS 
OF MATHEMATICAL STATISTICS 


EDITED BY 
S. S. WILKS, Editor 


A. T. CRAIG J. NEYMAN 


WITH THE COOPERATION OF 


H. C. Carver R. A. FIsHER R. DE MIsEs 

H. Cramtr T. C. Fry E. 8. PEARSON 
W. E. DemIne H. HorTe..ine H. L. Rretz 

G. Darmois W. A. SHEWHART 


Manuscripts for publication in the ANNALS OF MATHEMATICAL STATISTICS @ 
should be sent to S. S. Wilks, Fine Hall, Princeton, New Jersey. Manuscripts) 
should be typewritten double-spaced with wide margins, and the original copy) 
should be submitted. Footnotes should be reduced to a minimum and whenever § 
possible replaced by a bibliography at the end of the paper; formulae in foot) 
notes should be avoided. Figures, charts, and diagrams should be drawn on™ 
plain white paper or tracing cloth in black India ink twice the size they are toy 
be printed. Authors are requested to keep in mind typographical difficulties | 
of complicated mathematical formulae. 7 


Authors will ordinarily receive only galley proofs. Fifty reprints without | 
covers will be furnished free. Additional reprints and covers furnished at cost. 7 


The subscription price for the ANNALS is $4.00 per year. Single copies $1.25. 9 
Back numbers are available at the following rates: 4 


Vols. I-IV $5.00 each. Single numbers $1.50. 
Vols. V to date $4.00 each. Single numbers $1.25. 


Subscriptions, renewals, orders for back numbers and other business com- | 
munications should be sent to A. T. Craig, University of lowa, lowa City, lowa. 7 


The ANNALS OF MaTHematTicaL Statistics is published quarterly by the 
Institut@ of Mathematical Statistics. 


COMPOSED AND PRINTED AT THE 
WAVERLY PRESS, Inc. 
Bautimorg, Mp., U.S. A. 















“ e 
‘ A | % i - . 
elie oie >. oa . 
= a 
- . -~ i ; 
é : wawnt ss - s , 
. if ry 2 " 
ana } , 
“ ys ‘ f - sf 
‘ bd i : 
. v cal 
A : ‘ zt : 
ye . Pea) ‘ 2 “3 
fi in : ’ Cane i ~ = 
! ' st ; 
, - e" 5 e 
: } 
° aad rT - 
. © 
>on a : a 
° = ‘ 
° 
7 5 ‘ 
* é 
F x ‘ “ ntl 
eee . 63 . : é : o 
a y : is 
e . 4 ® C 7 . . 
r ? « - 
i 
F < q A ad 
- ‘ . a rs " ~ on 
Pa 
: : : , io F 
- . : 7 
a . é ; 
‘ 5 
Fy : : sy 
p om 
a hs ) ad : Pp 
A F; - 
a i ba 
; 5 oe 
A , 
P ‘ , a 7 
‘ . as 
a 7 e a 
- ba 7 7” : 
, 
es ' ba o 
; Cu hi a ius 
F 
. i ener 
: ” * : " ‘i 
a | 
‘ ‘5 F : ‘ 
; F 
' rd 
= 
~ a ” y 
oT ‘ ; 
e eh tf Tee? P 
: 
. ; : 4 
es F 
F tp 








TESTS OF STATISTICAL HYPOTHESES WHICH ARE UNBIASED IN 
THE LIMIT 


By J. NEYMAN 


1. Introduction. The idea of unbiased tests of statistical hypotheses has 
been put forward and discussed in two recent papers.’ Recently also a particular 
problem was solved introducing a test which has the property of being unbiased 
in the limit.” The purpose of the present note is to discuss this conception in 
its general form and to indicate methods of determining the tests unbiased in 
the limit of a broad class of simple statistical hypotheses. The notation and 
the terminology employed below are explained in the papers quoted. 


2. Notation and definitions. Consider a set of n random variables 
(1) X,, X2,-°--> Xn 
the particular values of which 


(2) U1, %2,°°° Xn 


can be given by observation and denote by 2 the set of hypotheses concerning 


the probability law of (1) which are regarded as admissible. We shall assume 
that all the hypotheses included in Q specify the probability law of the X’s 
having the same analytical form but differing among them in the value of just 
one parameter, 6. Thus, if EZ, denotes the point (the “event point’’) in the 
space W,, of n dimensions with its coordinates equal to the values of (1) and w, 
any region in W,,, then the probability of EZ, falling within w, , as determined 
by any of the hypotheses forming the set Q will be denoted by 


(3) P{E, € Wn | 0} 


and will be a function of the parameter @. The probability (3) with fixed 6 
considered as a function of varying w, is called the integral probability law of the 
X’s. Frequently (3) is equal to the integral of a certain non-negative function of 
E, over the region w,. This function will always be denoted by p(E, | @) and 
ealled the elementary probability law of (1). 


1J. Neyman and E. S. Pearson: Contributions to the Theory of Testing Statistical 
Hypotheses. Part I. Stat. Res. Memoirs, Vol. 1, (1936) pp. 1-37. Part II, ibid., Vol. II 
(1938). 

J. Neyman: Sur la vérification des hypothéses statistiques composées. Bull. Soc. Math. 
de France, Vol. 63 (1935), pp. 246-266. 

2 J. Neyman: “Smooth’’ Test for Goodness of Fit. Skandinavisk Aktuarietidskrift, 
(1937), pp. 149-199. 


69 





70 J. NEYMAN 


Denote by Ho some particular hypothesis of the set 2 and by 4 the value 
that it ascribes to the parameter 0. 

A test of the statistical hypothesis Hp consists in a rule of rejecting Ho when- 
ever E, falls within a specified region w, and in not doing so in other cases, 
The region w, used for this purpose is called the critical region. It follows 
that to choose a test means to choose a critical region. 

We shall consider below only cases such that for any region w, the probability 
(3) considered as a function of @ possesses two successive derivatives. 

DEFINITION 1. Jf a critical region @, has the property that, a being a fixed 
positive number. 


(4) (a) P{En€ D, |} = a 


(5) (b) 2 Pik,ew,|o}| =0 


d 0=85 


2 2 
(6) (0) Sg PtHneD,|0}| | > PiBaewsl0}| 
where w, is any region satisfying (a) and (b), then the region @, is called the un- 
biased critical region of type A corresponding to the level of significance a, and the 
test of the hypothesis Ho based on @, , the unbiased test of type A. 

This is the definition given in the first of the earlier papers quoted. Now we 
shall define the test which is unbiased in the limit. For this purpose we shall 
have to consider the situation where n is indefinitely increased and consequently 
we have a sequence of probability laws (3), a sequence of spaces W,, where they 
are defined and a sequence of regions @, , each @, being a part of the cor- 
responding W,,. 

We must also introduce a varying scale with which to measure the differences 
6 — @. This is due to the fact that, if the choice of the sequence of regions 
@, is not very unlucky and 6 ¥ 6 , then we shall frequently have 


(7) lim P{E, ¢ @,|0} =1 


Comparing this with condition (4), we see that in general the limit of 


P{E, € @, | 6} 


for n — © will be discontinuous at 6 = 6. To avoid this we shall measure 
6 — 6 in terms of n? introducing instead of @ a new parameter 3 connected 
with the former by means of the equality 


v 
(8) 0 = b+ 
/n 


For the hypothesis tested Ho we shall have ? = 0 and # ¥ 0 for any other 
hypothesis in 2. The new parameter # thus introduced will be called the 













Bw 


N- 
he 


all 


UNBIASED TESTS OF STATISTICAL HYPOTHESES 71 





standardized error in Hy. It will be frequently convenient to use 6 but occa- 
sionally we shall use # as well, for example writing P{E, ew, | #} instead of 
(3) ete., and it is necessary to remember the connection (8) existing between 6 
and #. It may be useful to notice at once that df/d@ = +~/n df/dd. 

DEFINITION 2. We shall say that the sequence of regions 








(9) W; , We, +--+, Wr,- 

determines a test of the hypothesis Ho which is unbiased in the limit and corresponds 

(in the limit) to the level of significance a, if for any n 
d | d° 


(10) (d) ge P LEn € Bn | 9} | > 


a ne | 
dé — qe PE. $1019) Sm 


0 


where W, is any region such that 
(11) 


and 











P\E, €w,| 8 = 0} = P{E, €@,| 8 = 0} 






Co _ 6 ms -~— 
(12) 7g P iE € Wn | 9} neo BE thn € On|} ; 


d=0 





and if 








(13) 





(ec) lim P{E, €@,|3d = 0} =a 


no 





(14) (f) lim £ P{Bs €Ba|P}ou0 =0 



















The practical application of the test determined by the sequence of regions (7) 
consists in observing as large a number n of the X’s of (1) and in rejecting the 
hypothesis Hy whenever E, falls within @, . If n is sufficiently large, then this 
rule will have about the same advantages as the application of the unbiased test 
of type A. In fact, allowing for the circumstance that the values of (11) and 
(12) will be only approximately equal to the limits (13) and (14), the properties i 
of the test satisfying the Definition 2 will be as fellows: If the hypothesis tested } 
be true, it will be wrongly rejected with a relative frequency approximately | 
equal to a fixed in advance. If Hp is false and the true value say #’ of # is not 
very different from zero, then the frequency of rejecting Hp will be greater than 
a and could not be increased by appiying some other similar test. 

It may be useful to notice that in general there may be more than one test 
of the same hypothesis which is unbiased in the limit and corresponds to a 
fixed level of significance. Consequently there is a possibility of choosing 
between such tests, but it seems to the author that such a choice would require a 
previous strengthening of the theorem of S. Bernstein on which the present 
work is based. 


72 J. NEYMAN 


3. Theorem of S. Bernstein. In the following, we shall have to use the 
following particular case of a theorem due to S. Bernstein.’ Denote by &(z) 
the mathematical expectation of any variate z and by 


a, a er 
We one He 


(15) 


two unlimited sequences of random variables. 
We shall assume that 
(1) X; is independent of X; and Y; for any 7 # j. 
(2) The following mathematical expectations exist and are independent of 7: 
&(X) =a &(¥) =b 
&X,-—a’ =o &(Y;-— db) =a3 


6 
(16) &{(X; — a)(¥; — )] = roiee 


&(|X;-a))=4 &|¥,-b/') =» 


Consider now the space of 2n dimensions W, and denote by £, a point in it 
as determined by the values of X;, Y; for 7 = 1, 2, --- n considered as its co- 
ordinates. Let u, and v, denote the sums 


nm 


(17) Un, = > X:, 


i=1 


and denote by D, the point on a plane S with its orthogonal coordinates equal 
tou, andv,. If sis any regionin S then let P{D, €s} be the probability of D, 
falling within s. 

THEOREM OF S. BERNSTEIN. If the variates (15) satisfy the conditions (1) 
and (2) then, for any « > 0, there exists a number N,, such that the inequality 
n > N, implies 

1 
Qrnoyo2r/ 1 — rv? 


, 1 (u—na)2 2 u—narv—nb (v—nb)2 
ia (age 2 ae +) 
| ees v1 8 *3 dudo| <¢ 
3 | 


whatever the region s in S may be. 


P{D,e€s} — 
(18) 


4. Tests unbiased in the limit. We shall consider the problem of determining 
the tests satisfying Definition 2, in the case where the following hypotheses are 
fulfilled. 


3S. Bernstein: Sur un théoréme limite du calcul des probabilités. Math. Ann., Bd. 97 
(1926) p. 44. 

See also V. Romanovskij, Bull. de l’Académie des Sciences de l’U. R. S. 8., 1929, p. 209 
and W. Kozakiewicz, Ann. Soc. Polonaise Math., t. XIII (1934), pp. 24-43. 





UNBIASED TESTS OF STATISTICAL HYPOTHESES 73 


(i) All the random variables (1) are mutually independent and each of them 
follows the same elementary probability law which we shall denote by p(z; | 6). 

(ii) The elementary probability law p(x; | 6) admits three differentiations 
and two consecutive differentiations with respect to @ under the integral taken 
over any fixed finite or infinite interval, so that 


a* b b a 
(19) a p(x; | 0) dx; -| ge PM 6) dz; 


for k = 1, 2. 
(iii) If 


(20) gi= d log p(x: | 4) and V; _ 0 log p(x: | 4) 


08 m6 o og 0=65 


then we shall assume the existence of the following integrals all taken from 
-x to + 


(21) i / gi p(a: | 0) dx; 

(22) b= [wet ob ples) oo ae, 
(23) = | eves | 6 ae 

(24) fie | p(x; | 00) dz; 


(25) / |i + oi |” p(w: | 0) da; 


Proposition I. If the above conditions (i), (ii) and (iii) are satisfied, V; 
being a function of x; and |r| < 1, then the sequence of regions ®, including all 
the points of W,, where p(E, | %) = 0 and also those of the remaining ones which 
satisfy the inequality 


n n 2 n 
26) Yw4 (2 e:) twat -ats-Otn 
i=] 1 


‘= 01 i=1 


where the coefficient M is to be found from the equation 


(27) gaa [- {oi : [ ed } ae = @ 
Van J-- \ Oe June 
with 


2 
(28) N= avn 
a2V/1 — Pr 





74 J. NEYMAN 


defines a test of the hypothesis Hy , which is unbiased in the limit and corresponds 
(in the limit) to the level of significance a. 

Remark. The calculation of M satisfying the equation (27) is, of course, 
laborious. But a table of values of M corresponding to varying values of N 
is being constructed by N. L. Johnson at the Department of Statistics, Univer- 
sity College, London, and it is hoped that it will soon be published. 

To prove Proposition I, we must first prove (a) that whatever n, the region 
@, determined by the inequality (26) satisfies the condition (d) in the definition 
2. The proof is based on the following Lemma.* 

Lemma. If Fy, Fi, --- Fm are functions of x, --- 2, integrable over any 
region in W,, and wo a region in W,, such that within wo 


m 


(29) Fy > > a:F; 
t=1 
while outside of wy 


(30) Fy < Da F; 
i=1 


Gy) , G2 +--+ Gm being some constant coefficients, then, whatever may be any other 
region win W,,, such that 


(31) fev [Pn ede ff F,dxz, ---dxn, fort =1,2,---m, 


we shall have 


(32) ff Fy da; day > | wee { Peds --- des. 
wo Jw 


Proor oF Proposition I. Denote, for simplicity, by p(E,) the ele- 
mentary probability law of the X’s as determined by the hypothesis tested. 
Comparing the statement of the Lemma with the definition (26) of @, , we 
immediately see that this region has the following property: whatever may be 
any other region w in W, such that 


(33) II P(E.) dx, --- dt, = | wee I P(E) dx, --- dx, 


and 


“= | | Sewbdde de 
w t=l1 


/n 
(34) 1 ¥ n 
m | ..- | YK ep(B)dn --- dt 


V/n ®, i=1 


4 J. Neyman and E. S. Pearson: loc. cit., pp. 10-11. 









er 


It, 





UNBIASED TESTS OF STATISTICAL HYPOTHESES 


we shall have 


s : | “7 I. (2 + (> e)) P(En) dai, +++ dx, 
(35) > tf i [ (= vi+ (= )) p(E.) day +++ di, 


But under the conditions (i) and (ii) 





(36) p(E»|9) = IT ple: | + 8//n) 
“i ap(E,|8)| 1 w_ 
(37) 39 be Vn » ¢ip(E,) 
ap(E.|9)| _1/¥ wn) 
(38) Oe lene - n (> me (> e) ) v(@) 


and it is easily seen that the relations (33), (34) and (35) are identical with 
(11), (12) and (10) respectively and that therefore the region @, satisfies the 
condition (d) of definition 2. It remains to prove that @, satisfies also the 
conditions (e) and (f), that is to say that, for n — , the formulas in the right 
hand sides of (33) and (34) tend to the prescribed values a and zero respectively. 
This conclusion concerning (33) is a consequence of the theorem of 8S. Bernstein, 
quoted above. To see this, write 


(39) Un = Dov; m= LW: 
i=1 


i=1 


and denote by so the region in the plane S of (u, v) defined by the inequality 
(40) vtw > Mov/n(l — 7) — noi tr 22 
01 


obtained from (26) by means of (39). The right hand side of (33) represents 
the probability determined by the hypothesis tested of the X’s satisfying the 
inequality (26). But this is satisfied simultaneously with the variates u, and »,, 
satisfying (40). Therefore, if we denote by D, the point in S with its coordinates 
equal to (39), then the right hand side of (33) may be interpreted as the proba- 
bility P{D, € so} of D, falling within s). Comparing (21)—(25) with (16), it is 
easily seen that, according to the Theorem of S. Bernstein, whatever may be 
e > 0, if n is sufficiently large, then 


(41) [PID € =~ II G,, dv 
50 


where 


« 





. 2 2,0 
1 1 fu? ou vtney , (vt+nei)*) 


(42) G, = aa e 2n(1=r?)\o? 01 a 








2Qrnoi 02/1 — r 


epee pte tnt eS 















“Sees wipes nen Oa 

















rs: ig Nag sam ines at te 


76 J. NEYMAN 


In fact, to what is given explicitly, we must only add that as 


(43) i p(x; | 0) dx; = 1 


the derivative with respect to 6 of the left hand side must be identically equal to 
zero. Therefore 


1 [ a 
(44) 3 | p(x; )) dx; \@=6g = [ gi p(x 7) dz; = &(¢;) = 0 
where again the integrals are taken from — ~ to +. It follows further that 
the second derivative with respect to 6 of (43) must be again identically equal to 
zero. Therefore, keeping in mind the definitions of g; and VW; , we may write 


9 


0 . ° 
(45) a / p(x; | 6) dx; |\en6, = (W; + ¢)p(x; | ) dx; = 0 


and thus 
(46) SH) = —8 (2) = —' 


The proof that the right hand side of (24) tends to a with n — will be 
completed if we manage to reduce the integral of (42) over the region so to the 
integral (27). This is easily done by substituting 


u 


avn 


(48) v + noi — roou/or 


a o2V/ n(1 — r?) 


Thus, if the coefficient M in (26) and (40) satisfies the condition (27), then the 
value of the integral of G, in (41) is permanently equal to a and this means 
that the right hand side of (33) tends toaasn— ~. 

Denote by p,(u, v) the elementary probability law of u, and v,. It will be 
noticed that, whatever s in S 


(49) P{D,€s} = II prlu, v) du dv 


(47) r= 


and that consequently in the course of the above discussion we have proved 
that, whatever « > 0, there exists a sufficiently large number N, such that 
n > N, implies 


(50) | (p,(u, v) — G,) du do | << 





UNBIASED TESTS OF STATISTICAL HYPOTHESES 77 
whatever may be the region s in S. We shall now use this circumstance to 


prove that, when n — «x , the right hand side of (34) tends to zero. It will be 
noticed first that 


(51) / ot (>> ¢:)* p(B.) dx --- dx, = lf u' p,(u, v) du dv 


fork = 1, 2. Further 


(52) II u’p,(u, v)dudv < | u'p,(u, v)dudv = no; 
80 . Ss 


Using the inequality of Schwartz,’ we may write 
£ q ay y 


a II u(p,(u, v) — G,) dudx 
vn so 
1 , i 
< Vi (| [ u”| p,(u, v) — G, | dudv If 'p,(u, v) — G,| du iv) 


Now, it is easy to calculate that 


(53) 


(54) lf u’| p,(u, v) — G,|dudv < 2no} 
. 80 
On the other hand, if is so large that (50) holds good for any region s in S and 


s, and s_ denote the two parts of so where p,(u, v) — G, is respectively positive 
and negative, then 


0< II | prlu, v) — G,|dudv = II (p,(u, v) — Gn) du dv 
_ II (p,.(u, v) — Gp) dudv < 2e 


and it follows that, for such large values of n, 


(56) |. If. u(p,r(u, v) — G,)dudv| < 20:~/e 


On the other hand, using the transformation (47) and (48), we find that 


1 a ae ee eee 
(57) - If uG, dudv = = | {ze tz Je | ed \ ae 
o1 Vn 80 V/ 29 =e V Qn JM—Nz2 y 


and consequently is permanently equal to zero. As e¢ is an arbitrarily small 
number, it follows that 


(58) lim =z If up,(u, v)dudv = lim — 5483 > virB. dz, --- 


which fulfills the proof of Proposition I. 


(55) 


+ See for example: S. Kaczmarz and H. Steinhaus, Theorie der Orthogonalreihen, 
Warsaw, 1935, p. 10. 





78 J. NEYMAN 


Proposition II. If the conditions of Proposition I are satisfied but either 
|r| = 1 or WV; ts independent of x;, then the test of the hypothesis Hy which is un- 
biased in the limit and which corresponds, to the level of significance a, is determined 
by the sequence of critical regions ©, , defined by the inequality 


(59) Xe] > dav 
i=1 
where d satisfies the equation 


l +X is 
(60) | ee" dx=1l-a 
/ 20 Jr 
Proor. We notice first that the condition |r! = 1 and the equation (44) 
imply 


¢ elW; + oi) p(x; 60) az.) 


(61) 
is / ehp (x: | 6) dr: / (W; + 03)" pla: |) day = 0 


| ei(Y; + 01)p(ai | 00) dz; i (W; + 03)’ p(x; | %) dx; 
(62) <= = A 0 
| tekacindae. | ed% + ot pls | 0) dx, 


and therefore 


(63) / (Wi + ai) — Agi(Wi + oi) | p(x; | ) dx; = 0 


(64) J esos + ob) — Aen | a) des = 0 

and finally 

(65) / (YW + of — Ag,)’ p(a: | %) dx; = 0 

which means that at almost every value of x; for which p(x; | %) # 0, 


(66) Wi + 01 = Ag; 


It follows that the inequality (10) in the definition 2 of the test which is 
unbiased in the limit reduces to the following 


: ; 9 
= / oe [ (>> gi) P(E, A) dx; eee dx», 


(67) 


1 [ ss | 
>| oo [ (>> ¢:)* p(En |) dx: --+ dan 





is. 


UNBIASED TESTS OF STATISTICAL HYPOTHESES 79 


owing to (11), (12), (37) and (38). On the other hand, the inequality (59) 
is equivalent to 


(68) (Lie) >a+bdi go 


with a = \’ojn and b = 0. Referring to the Lemma, we conclude that the 
regions @, satisfy the condition (d) of the Definition 2. It remains to show 
that they satisfy also the conditions (e) and (f). This immediately follows 
from the theorem of Liapounoff’ and the reasoning which we used above in 
order to prove (58). 

If YW; does not depend on 2; then, owing to (38) and (11), the inequality (10) 
immediately reduces to (67) and the proof of Proposition II follows exactly 
the same lines as before. 


5. Limiting power function. To know the properties of a test undoubtedly 
means to know (i) how frequently this particular test will reject the hypothesis 
tested when it is in fact true and (ii) how frequently will it detect its falsehood 
when it is wrong. The information of this kind is provided by the properties 
of the so called power function of the test. This has been defined’ as follows. 
Let w, be any critical region and, as formerly, P{E, € w, | 0} the probability of 
E, falling within w, as determined by a specified value of 6. If w, is fixed, 
then P{E, €w, | 6} will be a function of @only. To emphasize this circumstance 
we may introduce a new symbol, writing 


(69) P{En€wn| 6} = B(O| wa} 


which will mean that in the above formula w, is kept constant and @ varied. 
The function 8(@ | w,) thus defined is called the power function of the critical 
region w, or that of the test based on w,. If w, corresponds to the level of 
significance a and 4 is the value of 6 specified by the hypothesis tested Ho , then 


70) B(Oo | Wn) =a 


and it will be noticed that this is the probability of rejecting Ho when it is in 
fact true. As we reject Ho only in such cases when E,€w,, the values of 
8(@ | w,) corresponding to other values of 6 # 4 are equal to the probability of 
detecting the falsehood of the hypothesis Hyp when @ has any specified value 
different from 6). The larger the value of 6(6|w,) at a given @, the greater 
will be the ‘detecting power”’ of the test, which justifies the name attached to 
the function 6(@ | w,). Until the present time the power function of only a few 
tests has been studied and it follows that we know comparatively little of the 
properties of the tests even if they are in frequent use. The first study of this 
kind was concerned with the power function of the “Student’s’’ test as applied 
to the problem of one sample and there are three publications giving various 


®See for example Paul Lévy: Théorie de l’addition des variables aléatoires. Paris, 
1937. Pp. 101-107. 


7 J. Neyman and E. S. Pearson: loc. cit., p. 9. 





80 J. NEYMAN 


numerical tables.” However, in these publications the term ‘“‘power function” 
does not appear yet. Apart from the joint paper already referred to where 
the term “‘power function” was first defined, we may mention a few papers in 
Biometrika, the most important of which seems to be that by 8. 8. Wilks and 
Catherine M. Thompson.’ The purpose of studying the power function of any 
test is to be able to answer the following three questions: 

(a) What should be the size of a sample in order to have a reasonable chance 
of detecting the falsehood of the hypothesis tested, when the error in the pa- 
rameters that it specifies has some stated value? 

(b) If in some particular case a test failed to reject the hypothesis tested 
(which, of course, does not mean that it is necessarily true), is it likely that the 
error in 4 does not exceed some specified limit A? 

(c) Two different tests corresponding to the same level of significance are 
suggested for the same hypothesis Ho , which shall we use? 

In this last case the answer is obvious—the one which gives the greater 
chance of detecting the falsehood of the hypothesis tested in cases when it is 
wrong. But to know this we must know the power functions of both tests. 

For the above reasons it seems to be important to study the power function 
of the test unbiased in the limit as defined above. It is obvious that, as in this 
case the elementary probability laws are not specified, it is impossible to find 
the actual explicit formula giving the power function. Therefore we shall 
endeavour to find its limiting form. This will be done by means of the two 
following theorems. 

Consider an infinite sequence of situations 


(71) Si, Se, --- Sm, - 


In each of these situations we shall have to test the same hypothesis Ho con- 
cerning the probability law p(z | 6) and specifying the value 4 of 6. The situa- 
tions differ among themselves by the number of the X’s and by the hypotheses, 
alternative to Hy , which are considered. For the situation S,, we shall denote 
them by n» and H,, respectively. We shall assume that lim n, = + © when 
m—«o. As tothe hypothesis H,, , we shall assume that the value 6@,, which it 
ascribes to the parameter @ is 


8 
V rim 


(72) Am = oy + 


8 (1) S. Kotodziejezyk: Sur |l’erreur de la seconde catégorie dans le probléme de ‘‘Stu- 
dent.’’ C.R. Academie des Sciences, Paris, t. 197 (1933) p. 814. 

(2) J. Neyman with co-operation of K. Iwaszkiewicz and S. Kotodziejczyk: Statistical 
Problems in Agricultural Experimentation. Suppl. Journ. Roy. Stat. Soc. Vol. IT (1935) 
pp. 107-180. 

(3) J. Neyman and B. Tokarska: Errors of Second Kind in Testing ‘‘Student’s’’ Hy- 
pothesis. J.A.S. A., Vol. 31 (1936) pp. 320-334. 

°S. S. Wilks and Catherine M. Thompson: The Sampling Distribution of the Criterion 
Aw , when the Hypothesis Tested is not true. Biometrika, Vol. X XIX (1937), pp. 124-132. 





UNBIASED TESTS OF STATISTICAL HYPOTHESES 81 


where #, the standardized error in 4 , is kept constant. We shall assume that 
in each situation S,, we test the hypothesis Hy by means of the test unbiased in 
the limit and corresponding to the level of significance a. The power function 
of this test should be denoted by 8(@|@,,,), but to simplify the notation we will 
write simply 6,,(@). We shall be concerned with the value of this function 
Bm(Om) at the point 6 = @,, and we shall prove the following proposition. 

Proposition III. If the third logarithmic derivative of p(x;:\| 0) with 
respect to 6 1s bounded 


(73) | a log p(z:| 6) | 


363 < C = constant, 


and |r| < 1, then 


5A a Ba 
(74) lim Bm (8m) * [ . ‘¢ *~ [ le dy pdx 


This proposition is analogous to that” concerning the “smooth” test for 
goodness of fit. It could be used in the following manner. . 

When testing the hypothesis Hy and using for the purpose a certain number n 
of observations, we find ourselves in a situation which might be considered as 
one of the sequence (71). If is large, we may hope that the right hand side of 
(74) will give a reasonable approximation to the actual value of the power 
function corresponding to the value of @ to be calculated from (72) by sub- 
stituting in it mm = n. 

Proor. Denote 


(75) [seal = x:() 


We may write 


Be, , 82V; , F3y (95) 


(76) p(x; | An) = p(x; | A )ew ™m 2nm “én ste” 


where 6, denotes some value intermediate between 6) and 6m. Consequently, 
taking into account (39), (47) and (48), we have 


ai P(E nm |Omn) = T] p(x: | On) = P(Enm | )(L + ender 7 


where 


(78) 


1 
log (1 + Em) _ </Me 5 Pa(yVvi — ~ zr) a 3 


i=1 
Nm 


10 J. Neyman: “Smooth’’ Test for Goodness of Fit. Skandinavisk Aktuarietidskrift, 
(1937), p. 186. 








82 J. NEYMAN 





It is seen that, if m — o then e, tends to zero, uniformly in every bounded 
region of the plane, S, of xandy. Denote by s any bounded region in S and by 
W,.(s) a region in W,,, of which s is a transformation by means of the formulae 
(39), (47) and (48). The probability of Z,,, falling within W,,(s) is equal to 
that of the point with coordinates z and y falling within s. The former of these 
probabilities is represented by the integral of (77) over W»(s) and the latter 
by the integral taken over s of the elementary probability law pn(z, y | Om) of 
zx and y, corresponding to the value @,, of 6. Owing to the formula (77) we may 
write 















(79) Pn(2, y | An) = Pnl, y | 6)(1 4 anon 


where, owing to (78), nm tends uniformly to zero in sas m— «©. Remembering 
the connection between u,, v, and x, y and also the inequality (56), which is 
valid for sufficiently large values of n, we conclude that 


1 4 
(80) P(t, y | 60) = --* itiae o * 


where Q,, has the property that, whatever be « > 0, for sufficiently large values 
of m 


(81) Sf Q, dx dy 


where s is any bounded region in S._ It follows that 


<€ 





(82) Dm(Z, Y|Om) = <e geen 4. OL PG + «,) 


and that therefore, whatever be the bounded region s 


(83) lim Jf pote Yy | Om) dx dy = = II oe Niz-Pe1)? +97) dy dy 
m—>co 8 iT 8 


It is known however, that whenever an integral probability law tends to a 
fixed limit uniformly within any bounded region, then it must do so within the 
whole space. It follows therefore that the formula (83) is valid for any region s 
whether bounded or not. But 


(84) Bm(Om) = / I Pm(, Y | On) dx dy 
y>M—Nz2 


and it follows that 


. ' 1 +? f ye-te)? 1 [- _4y2 \ 
8 ] m On SS ee, . a. - d 1 ’ 
= oe cael. T ——— 


which completes the proof of Proposition ITI. 










UNBIASED TESTS OF STATISTICAL HYPOTHESES 83 


It is important to be clear about the exact meaning of the Proposition ITI. 


Suppose for example that in a particular case } = o; = 1 and consider a sequence 
of situations in which 


(nm, = 100, Nn = 100°, ++ th, = 109", ... 
(86) { " 
| A A + -l, Oe — 60 + -Ol, a Om — 6 + (-1) eS 


If this were the case, then the Proposition III would be applicable and we 
could affirm that the sequence of the power functions 6,,(@), each considered 
at the appropriate point 6, , has a limit, represented by the double integral in 
the right hand side of (85) with do, = 1. Accordingly, if we were interested 
in the value of the power function at 6’ = 6) +--02 with n = 10000 and @, = 1, 
then we could hope to obtain its approximate value calculating the double 
integral in (85) with 


(87) 3 = (6 — &)+/n = 2 


These are legitimate conclusions. However, it would be wrong to consider as 
proved that, if in the same example we increase the size of n to n’ = 40000, 
then the value of the power function at @ = 6’ will be represented by its limit 
(85) with # = 4 and with about the same accuracy as previously. It is just 
possible that to attain the same accuracy at 3 = 4 a value of nm greater than 
n’ will be needed. This of course would imply a corresponding change in @’. 

Proposition IV. If the conditions of Proposition III are satisfied but 
either |r | = 1 or VW; is independent of x; , then 


(88) lim Bn(Am) =l|- — [ oo dx 
m=000 V2n J-» 


The proof of this proposition is quite analogous to that of Proposition III. 


6. Examples. 
EXAMPLE 1. Consider the case where it is known for certain that 


1 1 
(89) p(z;|6) = 714+ @—6) 
but where the actual value of 6 is doubtful and it is desired to test the hypothesis 
Hy that 6 = 4 = 0, the alternative possibilities being both 6 < 0 and 0 < @. 
Before applying the test unbiased in the limit it is natural to try the unbiased 
test of type A. The critical region wo of this test is defined by the inequality 


n n 2 n 
(90) yut(Z a) >atbd gi 


t=1 


for -—-x~ <r< @ 


where the constants a and b must be found so as to satisfy the conditions 


(91) ff P(E, | 0) da, --- dt, = @ 


(92) | vee ) > ¢i P(E, |) dz, --- dz, = 0 


wo i=] 





$4 J. NEYMAN 


The technical difficulties involved in this problem are considerable and this 
may induce us to apply the test unbiased in the limit. Following the above 
theory we have 


22; 
1+27 
_ 4x; = 2 
(l+2)° 1+2; 
16(z;— 6) _ ——-12(a; — 8) 
(1 + (a; — 6)?)®> (1+ (a — 6)?)? 
It is easily seen that all the limiting conditions of the theory are fulfilled and 


that, in particulay | x;(@) | cannot exceed a fixed limit, approximately equal to 3. 
We have further 


(93) na 
(94) WV; 


(95) xi(@) = 


— . a x 
96 & : = ssl WV; = | 1 i= 
(96) &(¢;) &(W;,) cle Ga aa. ere E 


Similarly 
(97) S{W; + o1)"} = § = 02 
(98) &(9;V;) =0=r 


It follows that the regions @, , the sequence of which determines the test 
which is unbiased in the limit, are defined by the inequality 


= 1 = x; /5n 
99) 4 ~$2, —, +4 +.) >M4/=-; 
™ > oS ag yiggt (22 3) 2 V3 


where M should be calculated so as to satisfy (27) with 


(100) N= Ws 2n 


In order to test the hypothesis Hp we have therefore to observe the values 
%,%2,--+ 2, and to substitute them into the left hand side of (99). If the 
inequality is satisfied then the hypothesis should be rejected. 

Approximate values of the power function could be obtained from the right 
hand side of (85) with 


(101) 9o, = 0 /'3- 


EXAMPLE II. Let us assume as given that 


( Oz; 
| p(x: |) = be for 0 < 2; 
(102) 
az 0 elsewhere 





UNBIASED TESTS OF STATISTICAL HYPOTHESES 85 


with @ > 0, the hypothesis to test being that 6 = 6 = 1, with the alternatives 
poth @< landé@> 1. 

In this particular example the unbiased test of type A is easily found” and 
moreover’ it has also the property of being of type A,. But this circumstance 
does not diminish the illustrative character of the example. We have 


(103) g=1—-2; 
(104) Wv; = —1 = constant 


It follows that the regions forming the test which is unbiased in the limit are 
determined by the inequality (59). We have further 


(105) oi = &(¢}) = [ (1 — 2 )e"*dxr = 1 

and the inequality (59) reduces to 

(106) x (1 — a)| >AVn 

with \ taken from the tables of the normal integral according to (60) and to the 


chosen value a. Approximate values of the power function can be calculated 
from, say 


wy hs 1 ” —4(r—3)2 
(107) 8,(8) =1—- € dx 
—r 


V 2a 
with 
(108) d= (0—1)VYn 


The simplicity of the example considered permits to calculate the exact 
power function of the test and it may be interesting to obtain its limit 8,(8) in 
another and a more direct way. Write 


(109) } y= 


It is known that, if the probability law of each of the X’s is given by (102) 
then the probability law of y is 
9” - i 


M910 = a I* forO <y 


(110) n—1)! 


= 0 otherwise 


Neyman and E. S. Pearson, loc. cit. p. 18 et seq. 
> J. Neyman: Estimation statistique traitée comme un probléme de probabilité class- 
Series Actualités scientifiques et industrielles. Paris, (1938). (In the press.) 





86 J. NEYMAN 


It follows that the exact form of the power function corresponding to the test 
(106) is 


(111) eo|m,) =1-—* yg 
— (n a 1)! n—dr/n y y 
For values of n about 100 or more and for the values of 6 close to unity the 
distribution of say 


(112) en 6 2 vi — n _ by —% 

Vn Vn 
is practically normal with mean equal to zero and S.D. equal to unity. It 
follows that the integral in the right hand side of (111) is practically equal to the 
normal integral taken within the limits which are obtained by substituting in 
(112) the limits of y in (111). After some easy transformation we have, with a 
considerable accuracy 


1 (6-1) +/n+r0 ie 
6 | Wn = 1 _ —— e~ dz 
(113) a ) V/ 24 Ln. 


or, after some further transformations and taking into account (108) 


1 +A(1+-8/4/n) ; . 
(114) B(6 | W,.) = 1 _— J~ { et" du 


V 2x 
and it is seen that, when # is fixed and n indefinitely increases, then B(6@ | @,) 
does tend to 8,,(#). 


—d(14-8/4/n) 


UNIVERSITY COLLEGE, LONDON. 





THE TRANSFORMATION OF STATISTICS TO SIMPLIFY THEIR 
DISTRIBUTION* 


By Haroutp HoTre.LinG AND LEsTER R. FRANKEL 


1. Introduction. The custom of regarding a result as significant if it exceeds 
two or three times its standard error has now given way among informed statis- 
ticians to a consideration of the exact probabilities associated with the distri- 
bution of the statistic in question. For example, in such problems as that of 
examining the significance of the difference between the means of two samples, 
particularly small samples, it is no longer adequate to regard the difference of 
means, divided by the sampie estimate of its standard error, as normally dis- 
tributed. The significance of this ratio, ““Student’s ratio,” is judged instead by 
the value of 


(1) P=?2 [ $,(z) dz 


where n is the number of degrees of freedom entering into the estimate of 


variance, and 
r n+] 
2 1 


r(5) (1+5) 


If the probability law underlying the observations themselves is normal, and 
they are independent, P is the exact probability of the value of ¢ obtained being 
equalled or exceeded on the hypothesis that there is no real difference between 
the means. 

Methods of approximating P have been studied by R. A. Fisher’ and by 
W. A. Hendricks,” and tables have been presented by Student*® and Fisher.* 
Nevertheless, the practical statistician will very frequently wish to make 
judgments of significance without stopping to consult a table, or laboriously to 
compute P, and will tend to revert to the former inaccurate but convenient 
practice of treating ¢ as normally distributed with unit variance. The essential 


2 
(2) (2) a a/an 


* Presented at the joint meeting at Indianapolis of the American Mathematical Society 
and the Institute of Mathematical Statistics, December 30th, 1937. 

1 Expansion of Student’s Integral in Powers of n™'. Metron, vol. 5 (1925). 

? Annals of Mathematical Statistics, vol. 7 (1936), pp. 210-221. 

3 New Tables for Testing the Significance of Observations. Metron, vol. 5 (1925). 

‘ Statistical Methods for Research Workers, Oliver and Boyd, 1925-1936. Tables IV and 
VI. 


87 








88 HAROLD HOTELLING AND LESTER R. FRANKEL 










reason for this is that the normal distribution to which that of ¢ approximates 
for large values of n has only one parameter in the expression for the probability, 
Hence it is easy to remember a few important values, such as those correspond- 
ing to P = .01 and .05; and when values of P representing other levels of sig. 
nificance are in question, the single-entry tables of the normal probability 
integral are more easily available and easier to use than the double-entry table of 
Student’s integral. Indeed, ¢ is a more useful statistic than Student’s original 
ratio of mean to sample standard deviation, to which it is in the simplest case 
proportional, partly because of the close approximation of ¢ for large samples toa 
normally distributed variate of unit variance. 

For more complicated statistics the practical need for something simpler 
than the exact distribution is even more urgent, on account of the larger number 
of parameters involved in the distributions. For example, the large class of 
problems giving rise to probabilities expressible as incomplete beta functions 
require for exactitude the use of Pearson’s extensive triple-entry table,’ and 
even this is inadequate for some ranges of the parameters. The shorter tables of 
R. A. Fisher’ and of Snedecor’ are helpful, but are also necessarily of triple entry. 

It is a common practice, for example, among economists and psychologists, to 
select either by graphic methods or by preliminary calculation that one, out of 
many tests that might be applied to available data, for which P is the least. 
Such selection evidently introduces a bias, which is the more subtle because the 
tests giving high and therefore insignificant probabilities are likely to be for- 
gotten. Often the only way to guard against such fallacies is to insist on a 
value of P lower than is easily determined from tables. Thus, if k independent 
tests of significance have been made, and only the smallest value P is reported, 
its significance should be judged not by this value P itself, but by the probability 


P= 1—(1 — P)* = kP — aes 




























































































of the least value being so small. If we equate P’ to some such standard value 
as .01, then P must, for this standard level of confidence, take only a fraction, 
approximately 1/k, of this value. Such a small probability will often fall 
outside the range of existing tables. 

Instead of relying on tables or direct computation from the exact distribution 
of a statistic, it will sometimes be desirable to use a modification of the statistic, 
selected so as to have the normal or some other standard distribution. We 
shall consider a type of transformation of a statistic such that the distribution 
becomes the limiting form of the original distribution as the sample size increases. 
Thus our transformation will reduce to the application to the statistic of a cor- 
rection which will be small when the sample is large. We shall show how to 
make simple approximate corrections of this character for two cases. 







































5 Tables of the Incomplete Beta Function, Biometrika Office, 1934. 
® Loc. cit. Tables IV and VI. 
7 Calculation and Interpretation of Analysis of Variance and Covariance. Ames, Iowa. 
Collegiate Press. 1934. 











DISTRIBUTION OF STATISTICS 89 


The first of these is the Student ratio t, the lower limit of the integral in (1). 
Putting 


22 


(3) o(z) = s e ? 
(4) P=2 [ o(z) dz 


which in view of (1) and the fact that the integral of each distribution from — 
to © is unity is equivalent to 


(5) I $(z) dz = [ | $x(z) dz 


we shall show that z has an asymptotic expansion: 


ewtiy +1, 18+ BF +3 _ 35H + 100 +f — 15 
6) 4n 96n? 384n3 
4 62710 + 3224! — 1021 — 16800 — 945 _ \ 


92,160n‘* 


It will frequently be a sufficient approximation to treat 


(+1 
(1 - +1) 
as normally distributed. These appear to be approximations of practical 
value when n > @’. 

The second statistic whose transformation to a function having its limiting 
distribution we shall consider is the generalized Student ratio 7, appropriate 
to all the uses to which ¢ can be put, but with a multiplicity of variates instead 
of one to serve as the basis of the test of significance.” This is defined with 
reference to variates x: , --- x, , together with a linear function of sample values 
(proportional for example to the difference between the means in two samples), 
such that if £; is the value of this function of the sample values of x; (i = 1, ---, 
p) then the variance of £; in the population sampled is the same as that of 2; , 
and on the hypothesis to be tested, the population mean of each é; is zero. 
In terms of unbiased quadratic estimates s;; of the covariances o;; among 
%1, +++ , 2», each based on n degrees of freedom, we may define 1; ; as the cofactor 
of s;; divided by the determinant of the statistics s;;. Then J is defined by 


(7) T° = Vrije; 


® Harold Hotelling, The Generalization of Student’s Ratio. Annals of Mathematical 
Statistics, vol. 2 (1931), pp. 360-378. 








90 HAROLD HOTELLING AND LESTER R. FRANKEL 





the summations running independently with respect to 7 and j from I to p, 
For independent samples from a multivariate normal population, the distribution 
of T has been shown’ to be 


an({nt+l 


re) ($+ 5 


As n increases, the distribution of T approaches the x distribution with p degrees 
of freedom: 





grtg its, 


(9) 4 (p—2) (2) 
9} (P vor 
, 2 


By equating the probabilities derived from these two distributions, we shall 
define x as a function of 7, and obtain asymptotic expansions for the functions 
x and x’ thus defined. 

Since the probability associated with T is expressible in terms of the incomplete 
beta function, or the analysis of variance distribution integral, it follows that 
any of the many common statistics, of which simple functions have this distri- 
bution, can be expressed simply in terms of 7. Tests of significance in a wide 
variety of cases may therefore be made with the help of the asymptotic expan- 
sion corresponding to 7”, together with a table of x’. 

A further advantage of the transformation of a statistic into a normally 
distributed variate of unit variance and zero mean is that further statistical 
tests are possible with such variates. Since a great part of statistical theory is 
based on the assumption of such normal distributions, an extensive field of 
applications becomes available in this way. For example, if several independent 
tests give values of ¢ based on various numbers of degrees of freedom, and it is 
desired to combine these tests so as to get a single probability, the corresponding 
values of the normally distributed variate z defined above may be squared and 
added. The sum will then have the x’ distribution, with a number of degrees of 
freedom equal to the number of values of ¢ used. In a similar manner, the 
values of x” corresponding to a number of independently determined values of 
T° may be added, and the sum will have the x’ distribution with a number of 
degrees of freedom equal to the sum of the various values of p involved. 

The advantages of this type of what may be called “‘normalization”’ of a 
statistic have been brought out by R. A. Fisher for the particular case of the 
x “ facilitates such operations 


as the averaging of values obtained from independent samples, or taking the 


° ° . 10 
correlation coefficient. His use’ of z = 3 log 





® Harold Hotelling, loc. cit. 
10 Statistical Methods for Research Workers, Sec. 35. 








—- oo -f- = Oo OQ. 


DISTRIBUTION OF STATISTICS 91 


difference between two values, with the testing of significance of the result in 
each case. This is because z, unlike r, has a nearly normal distribution, with 
variance nearly independent of the population value. We note in passing that 
this function is the same as tanh ‘r, and may therefore be determined accurately 
and readily from the Smithsonian Institution Tables of Hyperbolic and Ex- 
ponential Functions. 


2. Normalization of ¢. The “duplication formula” in the theory of the 
Gamma function” shows that 


. (2+?) _ Vr0(n) 


2 7 n 
9” 1 as 
' (5) 


Substituting this in (2) and taking logarithms we have: 


log ¢,(z) = —}4 log n — (n — 1) log 2 + log I'(n) 


(10) n n+1 2 
- 210g (3) - 9 log (1 + =) 


The last logarithna may be expanded in a series of powers of z’/n which not only 
converges uniformly on the interval 0 < z < t when n > ?’, but has the property 
of being a uniformly asymptotic representation of the function on this interval. 
This means that the sum of the first 7 terms of the series (j = 0, 1, 2, - - -) differs 
from the function represented, by a quantity whose product by n’™ has, for 


sufficiently large values of n, an upper bound independent of z, so long as z 
remains in this interval. Uniformly asymptotic series have a number of 
important properties, among which is” term by term integrability with respect 
toz. In this sense we have the uniform asymptotic representation: 


n+1 z 2 2 —2 Bet— 22° 42° — 32° 
i: tie (1 ud “) 7" “= *-a ~“a * 


We shall obviously have another uniform asymptotic representation if we add to 
this, term by term, asymptotic series with terms independent of z, such as those 
for the gamma function logarithms in (10). Since” 

x _ 1)" B. 


~ 3 — 3 n— — 
(12) log I'(n) ~ 3 log 2x + (nm — 3) logn —n + » 3r(Or — Tyna’ 


B, = x0; Bs = 7s, By, = 35; Bs = °° 
1! Whittaker and Watson, Modern Analysis, 4th ed., p. 240. 
12H. Schmidt, Beitrdége zu eine Theorie der allgemeinen asymptotischen Darstellungen. 


Math. Annalen, vol. 113 (1937), pp. 629-656. The property mentioned above is proved in 
Schmidt’s Theorem 6. 


13 Whittaker and Watson, loc. cit., pp. 252, 125. 









92 HAROLD HOTELLING AND LESTER R. FRANKEL 


are the Bernoulli numbers, we obtain upon substituting in (10) this and the 







similar formula for log T (5), together with (11), and some simplification, 


3 2 4 
—1-—227 +2 
] in w= * i ES 
og ¢,(z) > log 27 9 + _ 


(13) 4 6 6 8 8 
Bz! — 22! , 1 — 42! + 32) | Set — 42” 
12n? 24n3 40n4 


Upon differentiating (5) we obtain: 














- 





te mes 












(14) (2) = 4.(0 








Since ¢ is simply the normal distribution function (3), this may be written: 


1 x ‘dx 
(15) —5 log 2n — 5 + log | = log x(t) 





We shall always in this paper use the symbol “‘lim’’ to mean the limit as n 
approaches infinity. The functions of n and z, or of n and t, which we shall 
denote by R, R’, R’’, with or without subscripts, are to be such that the absolute 
value of each has an upper bound independent of n, z and ¢ so long as n > 1, 
and z and ¢ are confined to some fixed finite interval. 

From (13) we have that lim log ¢,(z) = log ¢(z), 
whence, by the continuity of the exponential function, 


lim $,(2) = $(z) 


























t 
This holds uniformly for0 < z<t. Subtracting I ¢(z) dz from both sides of 
0 
(5) we therefore find that 


(16) I a= I * teal) — o(2)} de 


can by choosing n large enough be made as small as we please. Since ¢(z) > 0, 
it follows that the function z of t and 7 is such that 
(17) lim x = t. 


A parallel argument, proving slightly more than (17), is the following. From 
(13), 
























R’ 
log ¢n(z) = log ¢(z) + = 








where R’ is a bounded function of the kind described above. Therefore 


one) = o(1 +), 




















DISTRIBUTION OF STATISTICS 
Substituting this in (16) we have that 
z t 
j ¢(z) dz = : I ¢(z)R” dz 
t n Jo 
From the mean value theorem of integral calculus it then follows that 


—_ Ri 
(18) snt>= 


An asymptotic series may be substituted in a power series, and the result is a 
valid asymptotic representation of the corresponding function. (Schmidt, 
loc. cit., Theorem 4.) This justifies taking the exponential of each side of (13) 
and arranging in a series of powers of n’ to give 


(9) du(e) ~ 0241 + 22) 4 20) 4 ..| 


This asymptotic development will, like the original one, hold uniformly in 
every finite interval, and may therefore be integrated term by term. Thus 


an) [ade = [oer + 2O 499 4. 4 hac 4 Bes 


where | ;4; | has an upper bound independent of n and ¢t when n > 1, and ¢ is 
confined to a finite interval,0 <t< T. Substituting this in (16) we obtain: 


(21) [ ” o(e) de = [ $(z Yue) 4. vai “ih a ‘ Bis 


In terms of a sequence of functions fi , fo, --- of t to be defined below, let 
(22) w= tthe .. +4, 


. ° ° —] . 
Now ¢(z) dz can be expanded in a series of powers of n which converges for 
t 


sufficiently large values of n; for the Taylor series 


(23) oz) =o +@-HNgVM+- 


can be integrated to give a series of powers of x; — t, which by (22) is a poly- 
nomial inn’. As a matter of fact we have from (22) that 7; — ¢ can be made 
arbitrarily small by taking n large enough; consequently the series (23) and that 
obtained by integration in this way will converge uniformly and absolutely. 
We thus have: 


*3 _ 1 1a. 
I ¢(z) dz = ‘fo + (ho + hfe’) 
(24) 
+ 1 (ne + fifee’ + i Jie") + 





94 HAROLD HOTELLING AND LESTER R. FRANKEL 
Now let us define fi , fo --- , by equating the coefficient of each power of n 
in (24) to that of the same power of n in the right member of (21). This procesg 


gives a sequence of equations 


in |  badsves 


fro + ifte’ = | oleate 


fo + fife! + afte” = [ oleae) de 


fad + (fifs + 3f2)0’ + 4fifeo” + acfie’” = I (z)a4(z) dz 


Since ¢ ¥ 0 the first of these equations defines f; for every value of t; when f, 
has been determined, the second equation defines fo ; then the third defines f; , 
and so forth. It is to be observed that the functions f, , fz , - - - thus determined 
are not changed when the value of 7 appearing in (22) is increased; we have a 
unique sequence. 

If for the right-hand member of (15) we substitute that of (13), replacing 
z by t, and on the left of (15) put 


fi 


n 


fi 


P+ 


Pee? eo 


Feith fa 


dt Te 


> 


° ° all ° 
and then expand in a formal manner in powers of n , we shall upon equating 
coefficients of like powers of n obtain a sequence of differential equations 


— tf, = 3(-1 - 2° + ¢) 
(26) af + 4afit it 


These, with the initial condition f; = fe = --- = 0 fort = 0 determine the same 
sequence of functions as before. The equations (26) are in fact obtainable 
simply by differentiating (25) and cancelling out the factor ¢(¢). That this 
must be true follow from the equivalence of the various formal processes of 
manipulating series of powers of n™’, whether convergent or divergent, to give 
equivalent results. The differential equations are easily solved; the solutions, 
‘at least for f; , fe , fs, and fs, are all polynomials. Why they should come out as 
polynomials is not immediately obvious; but their calculation is made easier if 
each f; is replaced in the differential equations by a polynomial of degree 27 + 1 
with undetermined coefficients, involving only odd powers of t. The f’s of lower 
order are replaced by values previously determined, and the coefficients are 





ame 
able 
this 
s of 
give 
ons, 
it as 


DISTRIBUTION OF STATISTICS 95 


found by equating like powers of t. This process supplies at each stage more 
equations than unknown coefficients; their consistency verifies the assumption 
that f; is a polynomial of the kind specified, at least for 7 < 4. These poly- 
nomials are the coefficients of the powers of n™ in (6). 

The series on the right of (24) not only converges but is an asymptotic series 
uniformly valid when ¢ varies in any finite interval. Hence upon subtracting 
(24) from (21) and taking account of (25) we find that 


Px ae 


where | R;,,| is uniformly bounded. Upon applying the mean value theorem 
to the integral on the left we find that z differs from x; , and thus from the first 7 
terms (22) of the series (6), by a quantity whose product by n’** remains bounded 
when n approaches infinity. This proves the validity of the asymptotic ex- 
pansion. 


3. Accuracy of the Approximation. To follow through the above processes 
in such a way as to obtain useful limits for the error involved in using the first 
few terms of the series (6) in place of z would be excessively difficult. However, 
the magnitude of the error in taking the first two or three terms as an approxima- 
tion to x may be judged from the tables below to be adequately small for practi- 
cal purposes, provided n > ft’. The essential singularity of the normal distri- 
bution at infinity, in contrast with the algebraic nature of the Student dis- 
tribution, means a poorer approximation of one to the other as ¢ increases while n 
remains fixed, though a better approximation as n increases. This is illustrated 
in the following tables, where it will be observed that the approximations are 
better for large than for small values of n, and of 


P=2 [ ¢(z) dz = 2 [ $,(z) dz 


It will be seen that for n = 10 and P < .001, the utility of the asymptotic 
series, or at least of its first five terms, is vitiated by the rapid oscillation of 
consecutive terms, due to the high values of ¢’ in relation to n. 


P= .0l 


n=10 n = 30 





HAROLD HOTELLING AND LESTER R. FRANKEL 


P= .001 P = .0001 


n= 10 n = 30 n= 10 n = 30 n = 100 





4.587 3.646 6.22 4.482 4.052 
2.059 3.212 .05 3.69 3.88 
4.981 3.313 12.86 3.98 3.89 
0.896 3.283 — 20.44 3.85 3.89 
7.163 3.293 75.66 3.91 3.89 


3.291 3.891 


4. Transformation of the Generalized Student Ratio. The arguments and 
methods of calculation set forth in Section 2 may be applied with little or no 
change to the transformation of various other statistics in such a way that the 
limiting distribution for large samples is reached at once for the transformed 
statistic. In particular, to deal with the generalized Student ratio T’, we may 
equate (8) to (9), represent x as an asymptotic expansion with undetermined 
coefficients which are functions of 7’, and then by substituting and equating 
like powers of n obtain as before a sequence of differential equations for 
determining the coefficients. This process gives 


_ pT + T , 8 ~ 5p)T + (4 + 4p)T" + 137" 


(27) itil 4n 96n? 


This reduces to the expansion of z in terms of ¢ previously found if we put p = 1. 

It is somewhat more convenient in practice to use x’ and 7”, to avoid extract- 
ing the square root of the latter expression, and to utilize the existing tables 
of x’. Ordinarily therefore we should not use (27), but the series 


in ri _P+T , 4p) + 2+ 5p)T + 87" oh, 
2n 24n? 


which may be obtained in the same way, or by squaring (27) in a formal manner. 
That these are genuine asymptotic approximations follows by essentially the 
same argument as before. 


COLUMBIA UNIVERSITY AND WASHINGTON, D. C. 





ON COMBINED EXPANSIONS OF PRODUCTS OF SYMMETRIC POWER 
SUMS AND OF SUMS OF SYMMETRIC POWER PRODUCTS 
WITH APPLICATIONS TO SAMPLING (Continued) 


By Paut S. Dwyer 


PART II. THE FUNDAMENTALS OF SAMPLING 
Introduction 


We consider a population of N variates in which every individual possesses a 
common attribute. Let the variate x; be the measure of such an attribute for 


individual 7. From the N variates it is possible to form (*) different samples 


where each sample consists of n variates, n < N. 


: ; N N 
Each sample has its mean, variance, etc. so that there are Cc means, { 


variances, etc. The fundamental sampling problem, as interpreted here, is to 


find the relation between the moments of the ) means, and the moments of 


N . : . 
the (* variances in terms of the moments of the moments of the universe. 


Numerous attempts have been made to solve this problem, but each has been 
restricted in some way. It is the aim of Part II to indicate an approach which is 
broad enough to include many of the fundamental variations. 

The first chapter is devoted to a listing of criteria which should be satisfied 
by a theoretical development which is to be considered sufficiently general. 
These criteria might be applied to other statistics but the theory developed 
here is limited to those statistics which are moments (or functions of moments) of 
moments. The first chapter continues with an account of the more significant 
papers which have contributed to a general solution of the problem. No attempt 
is made to indicate a complete history, but rather there is presented a brief 
summary of a number of the most significant contributions. 

The second chapter is devoted to definitions and notation. An attempt has 
been made to use conventional notation whenever it is suitable. 

The third chapter deals with some of the fundamental principles which are 
used in the general approach. It presents a crucial part of the argument 
as it shows how various types of sampling problems can be reduced to Carver 
functions. 

The last three chapters deal with specific applications to some of the simpler 
problems. Chapter IV discusses the case of moments of the mean of the sample. 

97 








98 PAUL S. DWYER 








Chapter V considers the mean of the variance and the variance of the variance, 
while Chapter VI gives a large number of formulas, implicitly, in tabular form, 






























Chapter I. 





A Brief History of Previous Contributions 


In order to assist the reader in getting a perspective with reference to previous 
mathematical work on the relations between the moments of the moments of 
the sample and the moments of the complete set of measures (universe), a list of 
criteria’ is suggested below which might be applied to each contribution. These 
criteria group themselves naturally into two classes. The first eight questions 
can be answered categorically, while the remainder are less definite in nature 
and are not so subject to categorical answers. 

1. The Criteria. 1. Does the method apply to one type of frequency distri- 
bution only or is it broad enough in scope to include any distribution law? 

2. Is there any restriction as to the size of the sample? 

3. Is there any restriction as to the size of the universe? 

4. Is there any restriction as to the nature of the correlation between ob- 
servations? More specifically, is the method applicable only to some particular 
law of formation of the sample such as “‘drawing with replacements,” “drawing 
without replacements,” etc., or is it broad enough in scope to allow application 
to other orderly replacement laws? 

5. Is the application limited to one characteristic (variable) or can a large 
number of characteristics be treated simultaneously? 

6. Is it necessary that the universe maintain the same frequency cistribution 
during the formation of the sample or may it assume a different frequency 
distribution before each drawing? 

7. Does the method produce exact, rather than approximate, formulas? 

8. Does the method permit approximations to a required degree of accuracy? 

9. Does the method enable the author to write general laws in a compact 
form? More specifically, can he express, in a form which is not too symbolic, 
any moment of a given sample moment? If not, what order of moments can 
be expressed? 

10. Is the notation such that the general case can be turned into the more 
important special cases with relative ease? 

11. Does the development lead logically to the introduction of new moment 
functions (such as the semi-invariant of Thiele [B’; 209] or the k functions of 
R. A. Fisher (23; 203]) which are useful in condensing the results? 

12. Is a combinatorial analysis provided so that any given formula, or any 
part of it, can be checked for accuracy without too much effort? 









2. Review of previous results. The articles below have been examined with 
the criteria in mind. No attempt is made to write specific answers to all the 





1Many of these criteria have been suggested, in less explicit form by Tchouproff (15; 
461-471). The “‘Introduction’’ of his Metron paper is recommended for use as a supple- 
ment to the present chapter. 












ob- 
ular 
ving 
tion 


arge 


tion 
ney 


COMBINED EXPANSIONS 99 


criteria in each case, but rather to indicate the important features of each 
contribution. 

The papers discussed by no means cover all the work on moments of moments, 
although a rather complete bibliographical background is available to the reader 
who desires to examine the bibliographies attached to the articles mentioned. 
Undoubtedly the importance of the articles written in English has been over- 
emphasized. Since the important contributions of non-English writers (such as 
Thiele and Tchouproff) have eventually appeared in English, it does no serious 
harm to refer to the English versions even though the results may have been 
partially antedated by the author in some other language. 

A large number of the earlier results on moments of moments were limited to a 
special case of the problem, usually the case in which the universe is infinite 
and normal. The present summary deals with those authors who, during the 
past four decades, have made real contributions to the problem of generalization. 
A detailed accoynt of the history of moments of moments would include many 
valuable contributions which are not included here. 

It seems expedient to start with Pearson’s article “On the Probable Error of 
Frequency Constants” [2] which appeared at the opening of the century. 
Although by no means the first article in the field, it presented a rather complete 
set of formulas for the case of moments of moments. One advantage of these 
formulas is that they are relatively brief and yet this brevity results from the 
fact that they are approximate. The original paper dealt with the univariate 
ease, but it was followed by a later one [6] which discussed the case of more than 
one variable. 

These formulas have played an important réle in that they have assisted in 
making it clear that the moments of moments of samples must be estimated if 
one is to be permitted to draw conclusions from his sampling moments and that 
it is possible to work out formulas which serve as the basis of those estimates. 

Of great importance also was the contribution of T. N. Thiele to the sampling 
problem. Adapting certain ideas of Laplace, he used semi-invariants in which 
to express his results which he published in English in 1903 in “The Theory of 
Observations” [B’; 209]. He took the case of the infinite parent and any law of 
distribution and then worked out moments through the fourth of the variance. 

An earlier contribution of the introductory period was that of Karl Pearson 
in 1899 [1]. This paper is significant in that it provides formulas for the four 
moments of the mean when sampling is from a finite universe. The universe is 
not general, but obeys a simple frequency law. 

Another article of this period was that of Robert Henderson (1904) in which 
the first four moments of the mean were given for an infinite universe with any 
frequency law. This article, which was first published in Transactions of the 
Actuarial Society of America [3], was considered so important that it was re- 
published in 1907 in the British Journal of the Institute of Actuaries. Henderson 
gave, in addition to the first four moments of the mean, first moments of mez , m3 , 
m, although the last of these formulas is erroneous. 








100 PAUL S. DWYER 






Another important contribution of this period was that of “Student” in 
1908 [5]. He was interested in the properties of the normal distribution, but 
did not assume normality in his general derivation. He took an infinite popula- 
tion and wrote the formula for the variance of the variance. In this result he 
inserted the condition for normality. His further argument in the normal case 
implied the development of corresponding formulas for the higher moments of 
the variance, but he did not publish them as they were incidental to his main 
attack. The semi-invariant equivalent of these results had been previously 
given by Thiele [B’; 209-210]. 

The real contribution of “Student” to the general problem of moments of 
moments was his method, for it is his method which has been utilized by later 
writers. ‘‘Student’s’”’ method has the advantage that the development involves 
algebraic processes only. Contributions of Neyman, Church, Pepper, Carver, 
and the present writer are based upon it. 

An important development during the next decade, 1908-1918, was the 
establishment of the first four moments of the mean when the samples were 
drawn from a finite parent without replacement. It appears that a number of 
men worked this problem independently. For example, one might examine the 
results of Pearson [4], Isserlis [7, 8], Mortara [C], Tchouproff [11], and Edge- 
worth [9]. Probably the best English presentations of that era were those of 
Isserlis [8] and Edgeworth [9] which appear in the same volume of the Journal 
of the Royal Statistical Society. 

A most prolific writer on sampling during the next decade was the Russian, 
Tchouproff, who had been publishing in Russian and Scandinavian journals 
[10], [11]. His most valuable contributions were published in 1918-1923 in 
Biometrika (in English) and in Metron (in English). 

The first series of articles was published in three different numbers of Bizo- 
metrika in the years 1918-19 [12]. Tchouproff assumed an infinite universe 
and used the method of mathematical expectation. At first glance the most 
characteristic aspect of his work appears to be the complicated notation which 
he used. This notation was adopted because he undertook a much more general 
problem than had previously been attempted and hence needed to make new 
distinctions. Although he limited himself to the infinite case and one variable, 
he worked out the theory with the freedom that the frequency distribution of the 
universe might change between drawings. In the special case in which the 
populations are the same, he worked out the moments of the variance as far as 
the fourth. The chief criticism of his work concerns the complicated notation 
which seems to have been difficult to follow critically. A mistake in one of his 
formulas was not discovered for some years and then not by examination of his 
reasoning, but through the application of his results to an actual problem [17]. 

It is perhaps appropriate to insert here that in 1934 Feldman [30] rewrote the 
material of the second Biometrika article by simplifying the notation and extend- 
ing the argument to the case of two (and more) variables. 

Tchouproff continued to generalize his work and in the 1923 volume of Metron 








ee a ae) a a ee a ee ee | 


COMBINED EXPANSIONS 101 


[15] there appeared a series of articles in which there were no restrictions as to 
the size of the sample, no restrictions as to the type of sampling distribution 
(in fact the sampling distribution might vary between successive drawings), 
and no restrictions as to the law of replacement, or more generally as he expressed 
it, “no restriction as to the nature of the correlation between observations.”’ 
Criterion number 5 is the only one of the first eight criteria which is not satisfied 
in as much as the approach is limited to that of a single variable. Also the 
notation was extremely complicated and, although Tchouproff gave general 
formulas for moments of moments, these formulas are so symbolic in form that 
he did not find it expedient to write out specific formulas beyond the variance 
of the variance for such an important special case as sampling from a finite 
parent without replacements. 

During the same period J. Splawa-Neyman [14] had been examining the 
problem of sampling from a finite parent without replacements. He published 
his results in a Polish journal in 1923 [14] and his corrected results two years 
later in Biometrika [18]. He gave the well known formulas for the first four 
moments of the mean and a formula for the variance of the variance. He also 
gave some simple correlation formulas such as the correlation between the mean 
and the variance. 

At this time the basic problem of moments of momei.ts, at least as it was 
interpreted by Pearson and his followers, was the establishment of the first four 
moments of the given moment of the sample so that a Pearson curve could be 
fitted. A. E. R. Church, a worker in Pearson’s laboratory, was assigned the 
task of seeing how the moments of the variance work out in actual practice. 
In doing this he became convinced that the formula for the fourth power of the 
variance, which had appeared in Tchouproff’s Biometrika article, was incorrect. 
He tried to follow the argument of Tchouproff, but apparently was baffled by 
the complex notation and finally, at the suggestion of Pearson, decided to carry 
through the formula using the method of “Student.’”’ In doing this he dis- 
covered a mistake in the Tchouproff formula for the fourth power of the variance. 
At the same time he published [17] the formulas for the third and fourth power 
of the variance in the more conventional notation of that time. 

It might be noted that it is particularly fitting that Church should discover 
this error since Tchouproff, as Pearson himself stated in an editorial [13], had 
pointed out a number of errors made by the Pearsonian school. 

In the next volume of Biometrika there appears an article by Church [19] in 
which, among other things, formulas are derived for the third and fourth 
moments of the variance in the case of a finite population, sampling without 
replacement. Church claimed no particular credit for these formulas. His 
point is rather that they are almost valueless from a practical standpoint chiefly 
because of their length. The formula for the fourth power of the variance 
occupies three and one-half of the large pages of Biometrika and is given with 
the apparent aim of indicating, as Pearson said [21; 209], “the practical futility 
of the theoretical formulas.” 








102 PAUL S. DWYER 






Church gave full credit to Neyman for the formula for the variance of the 
variance and made no mention of Tchouproff’s Metron work and of the more 
general presentation there given. This was particularly unfortunate because 
it exposed him to the charge that he ignored non-English authors. This charge 
was immediately made by Greenwood and Isserlis [20] who broadened it to 
include Neyman and, by implication, Pearson himself. They advocated the 
case of Tchouproff who, now dead, was unable to defend himself. They gave a 
survey (valuable to the cursive reader) of the pertinent contributions of the 
Tchouproff articles and suggested that the ignoring of Tchouproff was par- 
ticularly disconcerting since it appears that Tchouproff had gone more than half 
way in his cooperation with English writers. 

Pearson replied in an interesting article [21] which made it clear that Neyman 
established his results independently of Tchouproff and that the language of 
Neyman is much simpler than the complicated notation of Tchouproff. Pearson 
emphasized that Tchouproff made no attempt to give specific formulas for the 
third and fourth moments of the variance in the case of sampling with replace- 
ments. Pearson did not answer, at least explicitly, the claim that the Tchoup- 
roff formulas are applicable to a more general case in which there is no restriction 
as to the nature of the correlation between observations. 

The year 1928 was marked by two important contributions. We first mention 
that of C. C. Craig who published his thesis in Metron [22]. Extending the 
previous results of Thiele, he was able to write the semi-invariant equivalent 
of the basic formulas in much less space than their previous moment formulation 
had demanded. He was able to write products of sample moments as well as 
moments of the moments themselves. His results are limited to an infinite 
population and one variable. The bibliography attached to his paper is com- 
monly mentioned in later literature for its completeness. For infinite sampling 
it might properly be used as a supplement to the bibliography of this Part. 

A most important contribution was made by R. A. Fisher [23] who was able 
to simplify the infinite sampling formulas greatly. He did this by introducing 
the sample function whose expected value is a cumulant (semi-invariant). In 
addition to the simplification, his ingenious attack resulted in the following 
contributions: (1) the recognition of the one to one correspondence between all 
possible independent sampling formulas and the partition of numbers, (2), that 
the extension of the multivariate form is accomplished by use of the partitions 
of multipartite numbers, (3) the tabulation of numerous new formulas, (4) 
the use of a general partition method by which any term in the formulas can be 
determined separately. 

The further development of the combinatorial analysis was indicated by a 
paper by Fisher and Wishart which appeared in 1931 [27]. It was shown how 
the more involved patterns could be broken up into simpler ones. 

The study of the infinite case was continued by Georgescu [28] who extended 
the Craig results. A feature of his work was the utilization of functions which 
yielded expansions of formulas in terms of successive degrees of approximation. 








— ll) oe 


COMBINED EXPANSIONS 103 


He applied Fisher’s idea of a combinatory analysis to the conventional sample 
moment function. 

Another paper of this series was that of Wishart [29] who gave a discriptive 
account of the contributions of Craig, Fisher, and Georgescu and an indication 
of the means of expressing the results of one writer into the language of another. 

The work of Joseph Pepper which appeared in Biometrika in 1929 [24] should 
be noted. Pepper took the case of the finite parent, sampling without replace- 
ment, and two variables, and then gave an extensive list of results. He did not 
have a very condensed notation and was forced to assume an infinite universe 
for the higher moments which he studied. The important point, for historical 
purposes, is that Pepper combined bivariate and finite sampling. It is to be 
recalled that Tchouproff himself in his generalized theory gave no results for the 
multivariate case. 

A significant advance in finite sampling was indicated by the appearance of 
Carver’s editorial on “Fundamentals of the Theory of Sampling,” which ap- 
peared in the first volume of the ANNALS OF MartuHematicaL Sratistics [25]. 
Carver took the case of a finite universe, one variable, and sampling without 
replacements. He presented a notation which enabled him to write the various 
moments of the mean through the eighth in simple form. He showed by a 
number of illustrations that his formula would give known results for cases 
both infinite and finite, when the proper restrictions were added. O’Toole [26] 
later generalized his results for any moment of the mean. 


3. Generalized Carver Functions and Sampling. The use of generalized 
Carver functions together with the results of Part I makes possible the presenta- 
tion of the general sampling theory in a compact, and yet not too symbolic, 
form. It is possible to write the sampling theory so that criteria 1-8 are satisfied 
although no attempt is made in the present paper to answer criterion 6. With 
reference to criteria 9-11, any affirmative answer must necessarily be tempered 
with qualifications as the results are far removed from that ideal solution which 
would permit one to determine the actual distribution of any sample moment. 
However the use of generalized Carver functions does permit a general concise 
statement of results as well as the determination of special cases. The method 
is also especially adapted to the introduction of new moment functions and to 
the use of partition analysis, although these topics are not emphasized in the 
present paper. In general it may be said that the use of Carver functions assists 
greatly in findirig the theoretical sample. statistics in the case of finite sampling 
since the Carver functions are condensed expressions of the size of the sample 
and the size of the parent, since they may be easily checked from symmetrical 
considerations, and since they are independent of the moments. They are also 
applicable to different replacement laws. 


4. The Use of High Moments. Precise agreement between theoretical and 
practical sampling does not usually accompany the use of high moments, and 





104 PAUL S. DWYER 


the practical statistician is apt to agree with Pearson who wrote, “I have a very 
firm conviction that the mathematician who uses high moments may make 
interesting contributions to mathematics, but he removes his work from any 
contact with actual statistics” [16; 117]. However since the extent of agreement 
between theoretical and actual results is in a sense a measure of the extent to 
which theoretical assumptions are actually duplicated in the experiment, it 
does seem sensible to discover what relations exist in the ideal theoretical case. 
Thiele implicitly supported the theoretical use of high moments (even in studying 
actual problems) when he wrote [B’; 13]: 

“Therefore the general rule of the formation of good laws of presumptive 
errors must be: 

1. In determining }, , and d¢ rely almost entirely upon the actual values. 

2. As to the half-invariants with high indices, say \. upwards, rely as ex- 
clusively upon theoretical considerations. 

a 

A more explicit advocate is R. A. Fisher who wrote [23; 200], “In the present 
state of our knowledge any information, however incomplete, as to sampling 
distributions is likely to be of frequent use, irrespective of the fact that moment 
functions only provide statistical estimates of high efficiency for a special type of 
distribution.” 


Chapter II. Notation and Definition 


The present chapter gives the fundamental definitions and appropriate 
notation. An attempt has been made to combine the most desirable features 
of the different notations of earlier writers. 


5. Ordered Sample. An ordered sample is a sample in which distinction is 
made as to the order in which the variate enters the growing san.vle. Thus 
the sample found by drawing zz and then 2; is the same sample as that obtained 
by drawing x, and then zz , but it is a different ordered sample. 

In some types of sampling it is possible that a given variate may appear more 
than once in the same sample. In general the number of ordered samples 
varies with the number of repeated variates. Thus the sample 2; + 2; results 
from but one ordered sample, while xz; + 22 results from either of two ordered 
samples. 


6. Power Sums. Power sums have the same meaning as in section 11 of 
Part I. An adjustment of notation is necessary as we need to distinguish power 
sums of the sample from power sums of the universe. The a-th power sum of 
the universe is denoted by (A) while the sample power sum is denoted by (a). 
Similarly, bold-faced numerals are used to indicate power sums of the universe, 
while light-faced numerals are used to indicate power sums of the sample. The 
symbol (A) is used to indicate that the variates are deviations from the mean of 
the universe. 





COMBINED EXPANSIONS 105 


7. Power Product Sums. Power product sums, called power products for 
previty, also have the same meaning as in section 11 of Part I. Large letters 
are used to represent the power products of the universe while small letters are 
used to indicate the power products of the sample. Thus (Q:Q2 --- Q;) repre- 
sents a power product of the universe while (qig2 --- g.) represents the corre- 
sponding power product of the sample. Power products are not used extensively 
except in the development of the theory of the next chapter where they play an 
important role. 


8. Expected Values. If a given statistical function, z is formed for every 
possible sample, then the arithmetic mean of the z’s is the expected value of z. 


Thus E(z) = 2 () where the = holds for all possible samples and S is the number 


of such samples. 


9. Moments. Moments demand precise notation since distinction must be 
made between moments of the universe, moments of the sample, moments of 
the moments of the sample, and moments about the’ mean for these cases. In 
addition we wish to indicate whether or not the universe is measured about its 
mean. 

a. Moments of the universe. The conventional y’s are used to indicate the 
moments of the universe. In this notation @ is used to indicate the moment 
about the mean of the universe. Thus 


(T) _ Le 
NN’ 


The usual formula relating yu, and ji; [22; 20] may be written 
2° 
= ya (—1) f _ - Mt—s 1 


_ (2) ()Q) 
a Ne’ 


_ 8) _ 32)@) , 0)" 
= N? Ne’ 


ete. 


_ (7) _ de 
wa ae and 


so that 


It is to be noted that, when (1) = 0, @: = we. 

b. Moments of the sample. We denote the moments of the sample by the 
letter m (23; 203]. 

In much statistical work deviations from the mean of the universe are used in 
place of the variates themselves. When the universe moments about the mean 
appear, we indicate them with a bar. However in denoting the moments of the 








106 PAUL 8S. DWYER 





samples, the moments of the mean do not appear and some other device js 
needed to indicate whether or not the variates are measured about the mean of 
the universe. The simple notations m; and m; are used to indicate that the 
variates used are deviations from the mean of the universe. A superpretix jg 
used to indicate the case in which the variates are not measured about the 
mean, 'm, 'm:. The values of m, (and 'm,) are obtained from the values of 


m: (and 'm:) by means of the formula 


t 
my, = » ‘Co le bs * al Mt—-sM}. {2! 


c. Moments of the moments of a sample. Since there are many possible 
samples and since a given moment can be computed for each sample, it is 
possible to express the expected value of this moment and the expected value of 
any power of it. The w’s are used for this purpose. Thus 





ur(m:) 










= E(m,)’ 


u,('m,) = E('m,)’ 











ur(me) = E(m:)' 








uy ( |) = E('m,)’. 


If the first one of equations {3} represents the whole group, then the values 
ii(m:), ar('m:), ar(m), and @,('m,) are indicated by 


ir(m) = >. (— jy ie a | A Mr—s(™M) (Mm). {4}. 





d. Moments of the product of the moments of a sample. The term 2.2 can be 


indicated by E(xy) = un(z, y). Similarly the expected value of the product of 
M, and m, may be indicated by E(mam») = uun(ma, m). In general 


















aaa ri ra Ts 7 
Mrire-+-r,Ma,) Magy °**> Me,) = E(m.im.: --- m,') {5} 





In the case of the product of sample moment functions, when the universe is 
not measured about its mean, it is preferable to use a single superprefix, asso- 
ciated with the u instead of a number of them associated with each m function. 


Thus 


| | ae 
win('ma, ‘ms, 'me) = 'winr(ma , mM , Me). 


The usual laws for changing from moments to moments about the mean in the 
case of the multivariate distributions are available. Thus 


















COMBINED EXPANSIONS 


jin (Ma , Mo) = wn(™ma, mM) — Mo(Ma , Ms) Mo1(Ma , Me). 

jau(™Ma » Ms, Me) = win(Mas, My, Me) — Mr0(Ma , Ms, Mc)Moo1(Ma » Ms , Me) 
Hi0i(Ma , My, Mc) Mo10(Ma , Ms , Me) 
Mou(Ma , Ms, Mc) py00(Ma , Ms, Me) 


+ 2ur00(Ma , My, Mc) Moro(Ma , Ms, Mc) Moor(Ma , My , Me) {7} 


etc. 


10. Different Sampling Laws. For theoretical purposes, any law may be used 
in the formation of samples as long as it results in functions of all possible samples 
which are symmetric functions of the variates. Any uniform law of replacement 
satisfies this condition and hence might be used in forming samples. Most 
statisticians who have worked on the sampling problem have been content to 
assume one or the other of two replacement laws. Each of these is “natural,’’ 
since it has wide application in the study of actual sampling. 

The two types of sampling which have received general treatment are sampling 
from an infinite universe with any law of replacement and sampling from a finite 
universe with a law of no replacements. The results of the first type are also 
applicable to the case of sampling from a finite universe when replacements are 
made after each drawing. These two types of sampling have been characterized 
by the terms “sampling from an infinite universe,” or “sampling from an 
unlimited supply” [25; 114] and “sampling from a finite universe’ [17], or 
“sampling from a limited supply’”’ (25; 101]. 

The theory of moments of moments for the first type of sampling has been 
developed to a high degree by such authors as Craig [22], Fisher [23], and 
Georgescu [28]. This extensive development has been due in part to the fact 
that the assumption of an infinite universe permits application of methods 
which are not applicable to the study of finite variation. The probability of 
getting a variate remains the same no matter what the law of replacement. 
The assumption of an infinite universe at first appears to make the results 
inapplicable to all actual problems where the universe is finite. However, if the 
universe is large, the assumption of infinite size does not greatly alter the results, 
although the extent of the change can not be determined without comparison 
with the results of finite sampling. A justification for the use of infinite sam- 
pling in actual finite sampling problems is based on the fact that the formulas 
resulting from sampling from a finite parent with replacements are the same as 
the infinite formulas. Hence the infinite results may be used to characterize 
finite sampling if sampling is done with replacement after each drawing. This 
clever scheme is somewhat invalidated, in actual sampling, because of the 
practicability of replacing and remixing after each drawing. Until someone 
demonstrates a technique which is practical and effective in securing randomness, 
it must be said that the value of infinite sampling theory as applied to finite 





108 PAUL S. DWYER 


sampling depends upon the theoretically unsatisfactory assumption that a 
finite universe is infinite. 

The theory of sampling from a finite universe without replacements has been 
developed by such authors as Isserlis [8], Tchouproff [15], Neyman [18], Church 
[19], Pepper [24], and Carver [25], although available results are not as extensive 
as those mentioned above because of the difficulty of algebraic manipulation 
and because of the length of the formulas. The fact is that the probability of 
getting a given variate varies with the different drawings. However, a “return 
to the bag” is not demanded. 

The terms “infinite sampling” and “finite sampling” are adequate to describe 
the two kinds of sampling discussed above, but they are inadequate in the case 
of finite sampling if additional replacement laws are introduced. Hence, it 
seems preferable to characterize the type of sampling by the replacement law 
if the population is finite. 

When the Carver functions represent known functions of n and N, it is 
possible to use them in writing moment formulas for any orderly replacement 
law. For example, it is shown in later sections how Carver functions can be 
applied to 

1. Finite sampling without replacement, 

2. Finite sampling with replacement after each drawing. 

3. Finite sampling without replacement up to the n-th drawing before which 

the n — 1 withdrawn variates are replaced and mixed. 

The Carver function can be used symbolically even in cases in which its 
explicit statement in terms of n and N has not been found. In some statistical 
formulas the Carver functions cancel, so that the results are independent of the 
sampling law. 


11. Variable Distribution Laws. It is possible to generalize the theory to 
include the case in which the variable takes on a different frequency distribution 
after each drawing, i.e., the general Tchouproff formulas can be written in terms 
of Carver functions. This theory can also be generalized to include many 
variables. In this dissertation, however, it is assumed that the universe remains 
the same, aside from the unreplaced variates forming the sample, throughout 
the sampling process. 


Chapter III. The Application of the Double Expansion Theorem 


It is the purpose of this chapter to establish the basic theorems on which the 
more specific work of the later chapters is based and to show how the double 
expansion theorem is to be applied to the sampling problem. 


12. Formulas Concerning Ordered Samples. a. Sampling with replacemenis. 
If the samples of n are taken from a universe of N variates and if the variates are 
replaced after each drawing, then the number of possible ordered samples is N* 
since for each of the n drawings there is a choice of N. 





COMBINED EXPANSIONS 109 


b. Sampling without replacement. If the variate is not replaced after each 
drawing, the number of ordered semples is 


N(N — 1)---(N—n+4+1) = N™. 


c. Replacement before the last drawing only. In case sampling is with replace- 
ment before the last drawing only, the number of ordered samples is 


N(N —1)---(N—n+2)N = NON, 


13. Theorem I. All moments of moment functions of samples can be expressed 
in terms of expected values of products of power sums of samples. 

By moment functions we mean rational integral isobaric moment functions 
(31; 22]. 

The theorem follows at once from the definitions of section 9. From {3}, {4}, 
{5}, {6}, {7} it is clear that all moments of moment functions of samples are 
expressible in terms of the expected values of sample moment functions. But 
since the sample moment functions are themselves defined in terms of power 
sums of the samples, the theorem follows. For example 


pre) = we) — nim) = B| ©) — OOOT' _] gf) _ DONT gy 


and 


jiun(M2, m1) wu(%%e2, M1) — pr0( Me, M1) uo (M2, m1) 


-[2-F]-[fO-COH[=] om 


14. Theorem II. All moments of moment functions of samples can be expressed 
in terms of expected values of power products of samples. 

This follows at once from the application of the multiplication theorem of 
Part I to the theorem of section 13. Each product of power sums is expanded 
by the multiplication theorem into sums of power products. Thus 


1 2 1 4 4 1 2 3 
= (3 a + 1) EU) + (-4 + £)z@1) + (4 — + 5.) B(22) 


—2 6 1 
‘ ( +5) Bu) +4, E0111). {10} 


15. Theorem III. To every power product form (qig2 --- qs) there corresponds 
a power product form (Q:Q> --- Qs). 
The argument is simple since the terms of (9:92 - - - qs) are themselves terms of 


(Q:Q2 --- Q.). It follows at once that, if (qige --- gs) exists, then (Q:Q2 - -- Q.) 
exists. 





110 PAUL S. DWYER 


As aa illustration, consider the universe consisting of 2; , 22 , X3 , %4 , %5 and the 
4 


sample consisting of x;, 22, 23,24. Then the terms of (q:9293) = 2, Tee 
21 i2g iz 


5 
are all contained in the terms of (Q:Q2Q3) = Zz UE G4 Bis - 


21 S12 N13 
16. Theorem IV. [Jf definite k’s can be determined so that 


E(qi92 +++ Gs) = Kpyvg---v,(QiQe --- Qs), {11} 


then it is possible to use the double expansion theorem and express the moments of 
the moments of the sample in terms of the P functions of Part I and the power sums 
(or moments) of the universe. 

The double expansion theorem was designed to replace (qig2 --- qs) by 
on ...p,(QiQe --- Qs). It can be used as well to replace E(qig2 --- gs) by 
Kp, ps---p,(QiQe --- Qs) if the values of k,,,,...», can be determined. The results 

of athe a substitution in terms of the power sums of the universe are then given 
by the double expansion theorem. For example 


Bay _ B@) , EQ) 


uo('m4) = ~ = 
n? n> 


and if E(2) = k,(2) and E(11) = ky,(41) then 


(ke — ku) = + <— 


u('m) = 
Ku(ay’ 
Re 


where Ko = ke _ ku and Ku = ky ° 

It then appears that the methods and tables of Chapter I of Part I can be 
used in finding expressions for moments of moments, in case ky, p,...p, iS known. 
Thus 


u2('i%2) = 





z| J | - z| OO) _ 2(2)()@) + OO | 


n3 n4 


_ Px(4) + Pul2)(2)_ fr + 2P.i(3)(1) + Pa(2)(2) + Pui DO] 


n> n3 


4 P(4) + 4P(3)(1) + 3P22(2)(2) + 6Pou(2)(1)() + Pun(1) 


nt 





and when (1) = 0 


wate) = (22 — 2P 4 Pe) aw 4 (Pr 2 4 PB), 3 


n2 n3 n> n3 nt 





COMBINED EXPANSIONS 


P, ky — 4k3, — 3k + 12ken — 6kun 
Ps Ky — 3k + 2kun 
Poe Keo — 2ker + Kun 
Po, = kor — kun 
Pun = Kuu 


as given by {54} of Part I. 
The basic problem has thus been reduced to finding k,,...», such that 


E(q% «++ 9s) = Ky,---», (QiQe «++ Qs). 


a rm . n ° 
17. Theorem V. The expected value of a sample power sum is always — times 


N 


the corresponding universe power sum no matter what the replacement law. 

The expected value of the sample power sum is always the same even though 
the k’s take on different values for different replacement laws. We note first 
that the number of ordered samples, S, depends upon the replacement law. 
Now a given sample power sum, (a), has n terms, while the corresponding 
power sum of the universe, (A), has N terms. All the a-th powers of the 
variates in the universe appear in the ordered samples and, if we add all possible 
ordered samples, these terms appear the same number of times. Hence 


> » (a) = 
> (a) = ki(A) and (Ay = ki. 


Now the number of the a-th powers of the variate in }> (a) is Sn so that each of 


on times. It follows that >> (a) = 7 (A) and hence 


the N variates appears Vv 


that E(a) = a (A). Hence 


E(a) = k,(A) where k, = {15} 


n 
N 
no matter what the law of replacement. 


An illustration may serve to clarify the argument. Consider a universe 
composed of x; , 2, 23 and write the six ordered samples. Then 


Va) _ zitatatatataitatatataitat a 
(A) ai + ae +23 
and 


= 4 


E(a) _2_ 1 


() 3 NW 





112 PAUL S. DWYER 


18. Value of k,,...,, for sampling without replacement. Consider a universe 
and all possible ordered samples. Form (Q:Q2--- Q.) and >> (qq --- qs). 
Now 2(qig2 --- Gs) is a symmetric function of the variates and consists of 
Nn products, and (Q,Q --- Q,) consists of N“ products. Each of the 
N products is repeated the same number of times in the N“”’n“ products of 
> (a@ --- qs). To find the number of times such repetition is made, it is only 
necessary to divide the total number of terms in >> (qiqz - - - qs) by the number of 


y(n) (s) 
terms in (QQ. --- Q,) which gives — ve Hence 


Nn 
} (qiq2 +--+ Gs) = Wo (Q:Q2 --- Qs) 


and, dividing by the number of ordered samples, N‘”’ 


’ 


*) 
n“ 


E(qig2 +++ Gs) = xq) (QQ --- Qs) 


so that 


as stated in section 46 of Part I. 
Since (qig2 --- Qs) = Si!se! --- Sp!M(qige --- Qs) 
and (Q:Q2 --- Qs) = si!se! --- sp!M(QiQ2 --- Q:) 


it follows that 


(s) 
EM(qiq2 +++ 4.) = F7q) M(QiQ --- Q.). {19} 


Most earlier writers on finite sampling have used the idea expressed in {19} 
as the foundation of their work. They have found it necessary to undertake 
enormous algebraic manipulation to expand in terms of monomial symmetric 
functions and then to expand back in terms of power sums after making the 
coefficient adjustment. Such long derivations are not only laborious, but they 
are also apt to result in algebraic errors and the results obtained have not 
emphasized the symmetry which is inherent in the nature of the probiem and 
which is very useful in checking-calculations. It was Carver who first discovered 
the type of symmetric relation involved and who used it in obtaining a compact 
statement of the first eight moments of the sample sum in the case of a single 
variable. He, too, found it necessary to carry out extensive algebraic manip- 
ulations as his reference to “lavish use of symmetric functions’? [25; 104] 
reveals. His keen insight into the essential nature of this problem led him to 
the conclusion that such extensive algebraic manipulation should not be 
necessary and that it should be possible to apply P functions to sample moments 
of order higher than the first. His confidence that this could be done and his 





COMBINED EXPANSIONS 113 


encouragement in the task have contributed in a large degree to whatever merit 
this dissertation may have. 


(s) 
With k,,...», it is at once possible to write the P function expansions. 


_ 7 
= Ne 
Following Carver, we let : n(n — 1) — 2 ete. and get, from sections 
€ rer, Te le = —, = =, GUC. ( aU, sec . 
£ ; P1 N p2 N(N ook 1) £ 
43 and 44 of Part I, 
P, = Pi p2 
P. = pi — p2 -e=- 
P; = pr: — 3p2 + 23 pe — 3ps + 2p, 
Ps = ps — Tp2 + 12p3 — 6p p2 — 2p3 + ps 
ete. 
19. Expected Values of Products of Sample Power Sums, Sampling Without 


Replacement. The tables of Chapter I of Part I are now available for use. 
Thus 


us('m:) = E('m)? = +, BQ)’ = 5 (Px(3) + 3Pu(2)(1) + Pus()'). {20} 


wl n 3n(n — 1) 4 2n(n — 1)(n — 2) 


N NW-1' NW DW -2) 
n(n—1)  n(n— 1)(n — 2) 

~ N(N—1) N(N—1)(N — 2) 

n(n — 1)(n — 2) 

N(N — 1)(N — 2)’ 


Poy 
Py _ 
Formula {20} might be written as 


1 , 
us('my) = ~alPsNus + 3P oN? yom + Pin N’ uj] 


We note further that as N —- x 


NP;—n, PxN* — n(n — 1), PN’? — n(n — 1)(n — 2) 


so that 
us('m) = * (ra + Sala — mem + nln — 1a — Bell 22} 


More generally 


Prny.--m(Q:)(Qe) «++ Qe) = Pamy.-mN baler *** Hay - {23} 





114 PAUL 8S. DWYER 


As N approaches infinity this becomes 


Prn,.--m,(Qi)(Q2) --- (Qr) = 0 Wa,ttar *** Har - {24} 


The laws of infinite sampling may be obtained by replacing power sums by 
moments and P»,...m, by n‘”. The tables given in a recent paper [31; 30-32] 
were obtained from the tables of P functions by this method. 


20. Sampling With Replacements. We next consider the case of finite 
sampling with replacements after each drawing. This is such a simple case 
that the P’s can be determined without finding the k’s. 

Consider a universe and the N’ possible ordered samples. Thus the nine 
ordered samples of 2 from a universe of 3 are indicated by the subscripts 


11 21 31 


12 22 32 
13 23 33 


The samples 11, 22, 33 are not repeated while the others are. The multiplication 
theorem can be used in grouping types of product terms as it was in Part I, 
but the terms themselves have different interpretation. Thus (1)(1) = (2) + 
(11) can be written as (1)(1) = (2) + [11] where the (2) indicates the sum of the 
n terms found by multiplying an z by itself, while the [11] indicates the sum of 
the n(n — 1) products formed by multiplying one x by another. Since some 
of the z’s may be alike, it is possible to have squared terms in [1-1], but they 
are not treated as squared terms, but rather as products. For example, if 


Q=u1+n 
(QQ) =a+ait+an+ oan 
so that 
(2) = ai + aj and [11] = mz, + mr. 


In determining the expected value of (1)(1), we note that 
~ (0) = V2) + VE 1 


where >> holds for the N" possible samples. Now >> (2) = ki(2) and ky = = 


N 
so that E(2) = ve) as indicated in Theorem V. Also [11} is composed of 


N N N 
N’n™ products of 7 Lit; = (> n(x ni) It follows that 


i,j=1 i=1 j=1 


rn (2) 


> [1-1] = N "3 (1)(1) and that 


E{1-1] = (1)(1). 





COMBINED EXPANSIONS 


(2) 
It appears that 7 plays the rdle of Py. 


ua('my) = = B((2) + (10 


<; [P(2) + Pu(1)(1)) 


(2 
n ' 


n 
where Pe = — and Py = —. 
N N- 


The corresponding argument holds for the general case. Any product of power 

sums can be expanded in terms of (q:q2--- qs). If duplicate variates are 

introduced, use the notation [q:g2 --- q:]. Form [qig2--- qs] for all the N” 

ordered samples. Now [qiq2 --- qs] has n“ terms and >> [mq --- gl = 

k(Q:)(Q2) --- (Q.) has n“’N" terms, while (Q:)(Q.) --- (Q.) bas N° terms. 
(s) arn 

It follows that k = - that 


N 8 , 


Liaw --- al = —* @MQ --- @, 


and that 


(s) 
Elqge «++ a) = yp; (QQ) --- (Q). 


Hence 
(s) 
pP _n 
mMy+++Ms Ns ° 


In general 


Pin... n OD «-- GR) @ a nts +++ rags 


Comparison with {24} shows that the same basic laws appear no matter whether 
sampling is carried on with replacement, or, in the infinite case, without re- 
placement. 


21. Other Replacement Laws. The two cases just examined represent two 
extremes of orderly replacement laws. It has been shown in each case how the 
Carver functions can be used to express relations between the moments of the 
moments of the sample and the moments of the universe. It is possible to show 
how these functions are applicable to other replacement laws. We take, as an 
illustration, the case in which no replacements are made after each of the first 
n — 1 drawings, but just before the last drawing the n — 1 variates are replaced 
and mixed. I do not present here the detailed argument, but simply indicate 
that the appropriate value of k,,...p, is 


ky. **Ds 


s) 
n' n— 1 


= yo + yon lm - 2) —n® + (27427 4... +2")(n — 2)°°”] {28} 





116 PAUL S. DWYER 


22. Different Frequency Laws. The distribution of variates may follow some 
known frequency law such as the normal, rectangular, binomial, Poisson, ete. 
In such a case, if the relations between the moments are known, it is possible 
to simplify the results. 


Chapter IV. The Moments of the Mean 


To illustrate the previous theory in a simple situation we consider the moments 
of the mean. Carver [25] has done this previously for the case of finite sampling 
without replacements, but he has taken the measures of the universe as devia- 
tions and has used the sample sum rather than the sample mean. O’Toole [26] 
has generalized Carver’s work. 


23. The Moments of the Mean. We have at once 


wi('m) = * B(1) a : P,(1) = 1 
pa(!m) = = E(Q1)’ = = [P2(2) + Pu(1)(1)] 
us('m) = +, BQ)’ = © [Ps8) + 3Pa(2)(1) + PnA(2)(2)) 


pa('m:) = 5 BC) = 5 (PA) + 4Po(8)(1) + 3P2(2)(2) + 6Pau(2)(1)(A) 


+ Piun(1)‘) 


, 1 1’ ® > \rs 
ur('mm) = — DY & a sd Pyjt...pt(Pi)™ +++ (Ps) (29} 


n’ 


24. Moments About the Mean of the Sample Mean. Using {1}, we get 
1 
pe('m) = 72 [P.(2) + (Pu — P})(1)(1)] 
1 . ‘ 
jis('m1) — at [P3(3) + 3(Pa = P2P;)(2)(1) + (Pin — 3PuPi + 2P})(1)’] 


pillen) a ~ (P(A) + 4(Py — PsP;)(3)(1) + 3P2o(2)(2) 


+ 3(Pen — 2P2Pi + P.P})(2)(1)(1) 
+ (Pun — 4Pin Pi + 6PuPi — 3P3)(1)‘] {30} 





COMBINED EXPANSIONS 117 
These formulas can be written in the notation of moments of the universe as 


1 
ja('mi) = 72 [P2Nus + (Pu — Pi)N’ uj] 


' 1 
fs('ms) = —5[PsNus + 3(Pa — P2Ps)N’wem + (Pin — 3PuPi + 2Pi)N'ui) {31} 
etc. 


25. Moments of the Sample Mean When the Universe is Measured About its 
Mean. When (1) = 0, the formulas of section {23} become 


u(™m) = 0 


TAC) = a P,(2) 
n 

(m) = +. Px) 

Ms\M™M er 3 


palm) = (PQ) + 3P xf)’ 


1 1’ 7T> \T Th \F 
um) = — Zz * es aa P,11...,%(P)"' --- (B)"* {32} 


n" 


where the >> holds for all partitions having no unit parts. In the language of 
moments {32} becomes 


1 1” 1F71°**Pstsf— wi = Ts 
Mr(m1) a nt = Le ee wid Pn... (p,) “ae (ip,) {33} 


where the >> holds for all partitions of r having no unit parts. 


26. Moments About the Mean of the Sample When the Universe is Measured 
From its Mean. Similarly, when (1) = 0, the results of section {24} become 


= 1 
jie(m) = + PQ) = — P2Nin | 
n n 


1 = 1 
is(m™m) = 7 P;(3) = 7a f sNis 


is(™m) = <P (4) + 3Px(2)’] 


1 
— [PaNie + 3 P22 N’ jis] 








118 PAUL S. DWYER 





It is to be noticed that the values j,(m,) are equal to the values u,(m). This 
results from {4} and the fact that u:(m,) = 0. It should be noted also that 
fir('my) 3 wr('my) as wa('mi) & 0. 





27. Sampling Without Replacements. The formulas in sections 23-26 are 
general formulas which become more specific as given replacement laws are 
introduced. If the law is sampling without replacements, we recall that 
(s) 


N® 






P, = p., Po = pi: — m, P3 = pi — 3p2 + 2p; , etc. when p, = It is at once 


possible to write the appropriate formula. Thus 


ae 


n3 







fis(m) = ys(m:) 
(N — n)(N — 2n) _ 
n2(N — 1)(N — 2) ** 


Now jf3 = 0 in any symmetric universe, for example a normal or rectangular 
one, so fi3(m,) = 0. 


1 . g 
= ni [px — 3p2 + 2p3|N w= {35} 





(r) 
28. Sampling With Replacements. In this case P»,,...m, = 7 


and we have 





r 












ui('m1) 


|— 


yo('m) = — [nus + n(n — 1)ui] 


~ 


t 









us(!mm1) = : [nus + 3n(m — 1)uoms + n(n — 1)(n — 2)u}] 


n3 


1 ° 
ua('m) = A [nus + 4n(nm — 1)usui + 3n(n — 1)ye 


2 (4) 4 
+ 6n(n — 1)(n — 2)youi + nus 
and in general 


sf ] 1’ (p Ty Ts { 
Mr my) = nr 7 bs a id n (uy,) a (u;,) {36} 


and 


3 1 : 
pio('m) = ne [nus — ruil 


| 1 3 - 
fis('m) = ni [nus — 3nyom + 2nu}] {37| 





1 ° . 
ps('m) = = [nus — 4nysur + 38n(n — 1)us — 6n(n — 2)uouy + 3n(n — 2)yi] 











COMBINED EXPANSIONS 


fo(m) = po(m) = : 
fis(m) = ws(m) = 


fis(™m) = pa(m) = “a Lis + 3(n — 1) fi] {38} 


ete. 


29. Sampling With Replacements Before the Last Drawing Only. The 
values of ky,...p, of section 21 determine the values of the P’s. Thus P: = 
n n(n — 1) 2(n — 1) _ n(n — 1) R. — 1) 
lc: li SOD ie, ts age sees Py, = ad 
he — Iku = 5 — WOW) + NIN) OE = NW 1) — MO —  © 


je('my) = X[n- a ie ar ee bana <= A} {39} 


jio(m) = = i[ - J Xn — | i {40} 


30. Different Frequency Laws. As indicated in section 22, the frequency 
distributions of the parent may be characterized by some moment relationship. 
This relationship can be inserted and the resulting formula simplified. For 


example, if the law of the formation of the universe is that of the hypergeometric 
series [25; 113] 


= pal” + (—1)"p""'], {41} 
we have 


. rt w 
jio(m) = 3 Npa 


Psy 
jis(m) = aa pag p’) 

] , . sical 
jis(rms) = —(PsNpq(q? + p*) + 3P22N*p* a’) | 


ete. | 


Where the values of P: , P3 , Ps are to be inserted according to the replacement 
law which is used in forming the samples. The results for sampling without 
replacement agree with those given by Pearson [1]. 


31. Moments of the Sample Sum. We might use the sum of the items in the 
sample instead of the sample mean. For example 


uo(1) = EQ) = n’E(m)” = n°po(m). 








120 PAUL S. DWYER 











The results would parallel the results above except that n’ in the denominator 
would be eliminated. It is the sample sum which is used in Carver’s article 
[25] and this should be noted in comparing results. 


Chapter V. The Mean and Variance of the Variance 





As a further illustration of the use of the Carver functions there are presented 
in this chapter formulas for the mean of the variance and the variance of the 
variance. 






32. The Mean of the Variance. 
a(t) = z|@ " 
n 


és =! (2) — P22) + Puy’ 


n? 


n2 













” J (eur, — P2)(2) — Px(1)’] {43} 





yi(m™a) = = (nP, — Ps)Nis. (44) 









When sampling is with replacements P; = P2 = y and we get the well known 
) 


wm) =~ g (45 









while when sampling is without replacements, we have the well known 


““? 


(M2) = {46} 






33. The Second Moment of the Variance. 
"7\2 4 
E ‘= — 22) 4 
n nN n 


yo('i%2) = 


becomes 


main) = (22 — PEs 4 Pe) cay — 4 (Pa — Pe) cayaay 


+ (Fu — PP 4 Pe) cayeay — 2 (Pe — 3a) cyaaycay + Panta (47 

















COMBINED EXPANSIONS 


po(Me) = (2 oe 2Pa e) (Z) 4 (Fz 1 2P a) y’. 


n> n3 n> n3 nt 
These of course can be written in terms of moments of the variance. 


34. The Variance of the Variance. Since jio('i%2) = po('M%2) — yi('i%e), we 
have 


a(t) = (E- s , P*) 4 (Fe - Bn 2 rs) (3)(1) 


n* n3 nt 


+ jE — a‘ — = (2 _ *) | @@ 
n? n ” - si 


— | Ps aS. 3Po1 cao P,P + PE) (2)(1)(1) + ar 1111 | 


ns nt ns 


— Ph yt 149) 


Formula {49} may also be written as 
1 9 9 
jio('™Me) st a {(n° Pe — 2nP3 a Ps)Nug — 4(nPo — P3,)N° usu, 


+ (n’Pu — 2nP + 3P2 — n’Pi + 2nP,P2 — P2)N’ ue 
— 2(nPin — 3Po — nPi Pu + P2Py)N* pou} + (Pun _ Py) * ui}. {50} 


Formulas {49} and {50} are not expressed in terms of deviations of the variates. 
Neither do they assume any particular replacement law nor any particular 
type of universe. 

In case the universe is measured about its mean we can write at once, by 
placing (1) = 0 in {49} 


jie( 72) = (2: _ 2Ps + z) (4) 


n* ni 
+ [(- 2e + e)- (2-2) ]@@ on 


n? n3 n 


and 


jio( Me) = = {(n’ Pe al 2nP; + P,)Nis ad (n? Pu = 2nPo + 3P oe a n* P? 


+ 2nP,P2— P3)Njz}. {52} 


35. Sampling Without Replacements. Using the P’s as defined by sampling 
without replacements, it appears that the coefficient of the us term 
(2 _ 2P; _ N (N — n)(n — 1)(Nn — N —n- 1) 


i a § 
ower * - 


ni (N — 1)(N — 2)(N — 3) 





122 PAUL S. DWYER 


agrees with that given by Neyman [18; 477], Tchouproff [15; 660], Pepper 
[24; 234], Carver [25; 270]. Also the coefficient of the u: term 


Py _ 2P z 3P 2» = (F fs) N 


n? n3 n4 n n? 


_N(N —n)(n — 1)(N° n— 3N° + 6N — 3n — - 3) 
— n(N — 1)2(N —2)(N—3) 


agrees with that of the above authors. 

As far as the author is aware, no one has written the coefficients of u3m1, u2u4, 
and yu; in the formula for jie('77i2). 
The coefficient of yz; is 


‘nil = - Ps) xt = —4N(n — 1)(N — n)(Nn — N — n—1) 
n3(N — 1)(N — 2)(N — 3) 


n n* 
e @- 
The coefficient of yep; is 


—2 (Pur _ SP _ PaPu 5 Pabst) ys 
n n n n4 


_ 4N° (n oo 1)(N — n)[(2n | — 3)N — 3(n - 1)] 


ns ~ (N — 1)?(N — 2)(N — 3) 


while the coefficient of yu} is 


a Wa pw —a@—3y OI 


ns nt 


(Faw _ PuPs) Ni = 2N’ (n — 1)(N — n)[(2n — 3)N — 3(n — 1)] 


It is possible with some algebraic manipulation to use the P functions to express 
the coefficients of the moments as functions of N and n. The suggestion here 
is that such algebraic work is unnecessary since the left members of {53} --- 
{57} are as easily handled in an actual problem as the right hand members. 
It is possible to compute the coefficients from the p’s and the P’s without writing 
explicit expansions in terms of N and n. Besides the formulas involving N 
and n are so lengthy that algebraic errors are apt to occur. The use of Carver 
functions is further advocated because the same basic formulas are applicable to 
all types of sampling and because the tables of Chapter I of Part I are directly 
applicable. 
n” 


36. Sampling With Replacements. If Pn,...m, = No? the coefficient of 


ite * Linn ii — While the coulcient of oi is 


= Un" — 2n + 3)n(n — 1) — (n? — 2n + 1)n’] = (n — 1)8 — mM) 


ns 





COMBINED EXPANSIONS 


Then {52} becomes 

ja(ms) = 5 [(n — 1)*aa — (n — 1)(n — 3) 
The formula for jie('%%2) becomes 
jee) = = (nm — 1)* ws — 4a = 1s — (n= 1)(n — 3d 


+ 4(2n — 3)(n — 1)uoui — 2(2n — 3)(n — lui]. {59} 
Now {58} can be written in terms of semi-invariants by the use of fj; = \4 + 3X3 


and fe = Xe SO 


ji2(™%2) = , [(n — 1)’Ay + 2n(n — 1)d5]. 
See [B’; 209], [22; 57]. 


37. Different Distribution Laws. Given frequency laws can be inserted. 
Thus {44} becomes 


ui(iM%2) = a (nP; — P2)pq if the 
while {52} becomes, if #2 = pq and jis = pq(q + p*) 


N : 
jig(7%2) = = (n’ P. — 2nP; + Ps)pq(q’ + p’) 


t 


ot e (n?Pu — 2nPa + 3Pe — n°Pi + 2nP2P; — P2)p'a’. ; 60} 
n 
Other frequency laws can be inserted similarly. 


Chapter VI. Tabular Presentation of Formulas. 


It is the purpose of this dissertation to show how the P functions can be used 
in finite sampling rather than to present an exhaustive list of formulas. The 
specific formulas of the two previous chapters are derived, primarily, for illustra- 
tive purposes. The implication is that other formulas may be derived similarly. 

However, it is possible to present, implicitly in tabular form, a number of 
formulas. In this chapter there are presented formulas involving moments of 
weight equal to or less than 6. 


38. The formulas of weight 2. 


i « z) a) — “aya 


n n 


ui('Mig) = ( 


nst'm) = | Pe@ + Pe | 


n= 








124 PAUL S. DWYER 


can be written in tabular form as 














n- 


with little effort. The first entries in the top row indicate the power sums of 
the universe, while the columnar entries indicate the moments of the sample. 


Now 
nm = BD - S| 


n- 


uo(™m) = z| OO). 


n 


and 





The coefficients of the power sums in the expansion of m are entered in the right 
hand part of the table. Thus, under 2, there appear the entries : and — z 
n- 


These when multiplied by the power sums as indicated on the left, give m2 = 
@)_ @da) ~ @@) 
n2 nz ~ n ~ 

Now the expected value is given by the proper P function expansion. The left 
hand portion of the table, which is the same as the P function table of Chapter I 
Thus the coefficient of (2) in E(mz) is i 


nn?’ 
while the coefficient of (1)(1) is — =. 


Similarly m;{ 


of Part I, gives such expansions. 


Hence the complete formula is 


P P P 
(ita) = E : z) (ay — P# ayy 
n n* n- 
as indicated above. 


39. The Formulas of Weight 3. Similarly the table 


ee ee 


3 21 | 111 | 3 21 | 1il 


P; 





Pe 



















ca 


COMBINED EXPANSIONS 


can be used to give the formulas 


pia) = (PA — 822 4 2P) cay — 3 (Pu 2) caycay + 2 Pe cy? 102 


| uss (Me , m,) a Gr - aT ) + (Fe oa as) (2)(1) — — 63} 


n 


- ho 


ys('m) = si + —> (2)(1) + —F ()’. {64} 


In case we wish to express the results in terms of moments about the mean, 


(1) = 0, and we have 
P P. 
wi(™%s3) = (% - : > + =) (3) 
n n> 


we, mM) = & 7 3) (3) 


n 


us(m,) = 3 (3) 


ait) = (P+ — 


oe P 
pu(ime, mm) = (4 


P. . 
u3(m) = =a Nis. {70} 
The insertion of specific sampling laws gives the specific results of earlier authors. 


40. The Tabular Forms. It is further evident that the power of n in the 
denominator is equal to the sum of the subscripts of the Carver function above it. 
We might utilize this knowledge and write in the right hand part of the table 
the numerators of the entries in the tables above. The table of weight of 3 
would then appear as 





3 





3 | P, 


21 | Ps 





lll | - 3Px Pm 


and it is anil to read {62}, 163}, {64}, {65}, {66}, and {67} directly from it. 





Trt WIZ 7OT 18% TCT 


my | “ae | “age 


me JOL 





Uty 

















ITIé | 122 | IT€ 





















































IIIl TIZ 


























I WIavis 








tt 
N 
m= 


COMBINED EXPANSIONS 





























































































































at T z '=i—= e—|p € ‘lp ¢ FUETEL 7) TENTS TOT Me TSH | TOS | ST | *g09 | Sl | "701 IST | dO} %d | of 

P = 5 wager _ a 279 - be 4 

T om it~ 1¢ @ i (st-i6-iti—vi- meet ag tap | as i + “ab ad | "db + "d| "db | °d |vI1Z 

"ds 

T cH 6 |9 wy 7 "Med? ed 16 + %q| "dz | "d |112z 

Se ee mg mge [mye eg | ge [gel ta [tte 

ot ot & 2. ts 2. | mg ae |ege | | ta lees 
[fl wei |i); 7 | | mg | jeg lug lug | eg tree 
11 ttit wl teemsi | | OUT fT Peg fg) | ea |e 
. 2 2 1 7 - om “7 | a 7 ty ee 
J 7 _ I 7 | | . "Ng °d a 
fof) eee ee ee dag | ea [te 

i ty ot yp tr ye a a es ee es ee es ees ie 
atfot & {otee [tt tefzzateelrty| ee |zr [ie jo | | ot | ore | az | tie | eee | ize | ue | es | a | ie] o | 

ko sien Soattaeemnrstheacicorraetned araeeilieen J 
| Cs cores rr" een eee a 2 ee ee eee 





128 PAUL S. DWYER 





The tables of weight W = 2, 3, 4, 5, 6 are given in Table I. The right hand 
partitions not involving unit parts are underscored as these indicate the columns 
which should be used if the universe is measured about its mean. As an illustra- 
tion we write from Table I the value of ue(7%e2). We get 


ie) = (Pe — 24) ag (Ba — 2a 4 MPa) ap 


n? n3 n4 n> n3 nt 
















as previously indicated. 


The same tabular scheme can be used to write formulas of weight greater 
than 6. 








41. Moments of Other Sample Moment Functions. It is possible to use a 


similar tabular scheme when we wish to find the moments of other sample 
moment functions. We define 





_@_ aa) 


n n? 








1, =< 8) — 3@)@) , 2)" 


n n> n 





, = & _4@@) _ 3@@) , 12@a)a) _ 6a 
n2 


n n> n3 nt 








and, in general, 








_ il 1" (pi)"' - ++ (p.)"* 
b= ES (-y% D!( ) fm 


ooo nP 


The formulas of weight 5 are given by Table II. 


TABLE II 
221 








5 
5 | Pi 





41 2111 e 41) 32/311/221/213 |1° 
















41) P2} Pu 









32 | P2 
311} Ps 
221 | Ps 


Pu 














Po Pry 20)}—4/—1) 1 
2P 21 


2P 21 
P2 








Pow | gol_3|—3 1 












Ziti P, 3P 31 Px oe 3P 2 3Po1 3Po1 Pin —60 12 5 —3 —2 1 














11111 | P; | 5Pa 10Ps | 10Ps11 | 15P 221 | 10Por1|/Paras| 








Th 


COMBINED EXPANSIONS 


Thus for example 
P, 7P. 12P, 6P;\,,- 
uu(ly, Li) = (3 - = + 7 :— a) Niis 


4 (— 10% 4 12Pu , 36Px _ 60Ps 


ns nt nt 


PW fisiiz. {72} 


If all the entries in the right hand part of Table I, except the unit terms in the 
main diagonal, are placed equal to 0, the tables can be used to give the moment 
function of the mz. Thus, when w = 3, 


mi('ms) = —* @) (73} 


- 


'un(me, m) = — “* (3) + = > (2)Q) {74} 


sat —- 


u3('m) = — af + —> (2)(1) + — (1)’ {75} 


P. 
ui(ms) = - Nis {76} 
P 
uu(me, mm) = = Nits {77} 


P 
u3(™m) = - Nis. {78} 


42. Other Moment Functions. The tables give such formulas as u,(m,), 
Hrir,(Ma, Ms), etc. If formulas for ji,(i7%c), fir,r,(M%a, MM») etc., are needed, it is 
necessary to go through the usual work of changing from moments to moments 
about the mean. 

Let us derive a general formula for the correlation of the mean and the variance 
as an illustration of the use of the tabular formulas. By definition 


fi (Me, m1) ; 
[ii20( 7%, m1) o2(IMe , m,)}} 





ru(M2,™m) = {79} 


fu(™Me, m1) = wu(Me , mm) 
jin (M2, M1) = pe(72) — y4(7M2) 


jioe(7iz , m1) = we(™m1) — wi(7m1) = we(m). 








130 PAUL S. DWYER 








Some of these values have appeared earlier in this paper. 
earlier results, we find from Table I 


(2 = =) Nits 
nr Mn 


P, 2P;,P Py 2Px , 3P2\ 
u2(M2) = (2 -— f+ ‘) Nis + (Fy —-— + Fe) N° jig 


n* n3 n4 n> n3 


Py, P 
wile) = (2: - ) Nite 


n n? 


Without using the 


pi(iMe2, m1) 


2 
9 


po(m) = — 
_ 


T- 







Hence {79} becomes 





— _ (nP2 — Ps)jis 
[(n? Ps — 2nP2P3 + PoPs)jisiie — (n® P2Pr — 2nP2 Po 

+ 3P2Px2 — n° PP} + 2nP;P, — P2)Nia} 
Formula {80} gives the correlation between the variance and the mean no 
matter what the law of replacement. If the universe is symmetric, f; = 0 and 
T,(Me2, 7m) = 0. 


The usual special cases may be obtained. When replacements are made, 
{80} becomes at once 


Tu(M2, mu) = {80} 














(n — 1) fis 
[(n — 1)fisie — (3 — n)jis]' 
as indicated by Pepper [24; 246]. 


When no replacements are made {80} reduces to results previously given by 
Neyman [18; 489] and Pepper [24; 245]. 


ru(Mm2, 7m) = {81} 


43. Conclusion. The theory presented here is capable of generalization in 
many ways. For example, application to multivariate distributions readily 
follows. However an attempt has been made in this dissertation to emphasize 
the essence of the method. Illustrations have been chosen to indicate its 
inherent generality. 

It should be stated, finally, that the aim of this dissertation is not primarily 
to provide a list of sampling formulas, but rather to provide a method by which 
the desired sampling formula may be derived without too much algebraic work. 

In concluding this dissertation, I wish to acknowledge the guidance and 
encouragement of Professsor H.C. Carver. Also I wish to express my apprecia- 
tion to Professor R. A. Fisher and to Professor C. C. Craig, who read the manu- 
script, or portions of it, and made needed suggestions for improvement. I am 
also indebted to Professor J. A. Nyswander and Professor T. H. Hildebrandt for 
valuable advice and assistance. 
















COMBINED EXPANSIONS 


BIBLIOGRAPHY 


(1) Pearson, K., ‘‘On Certain Properties of the Hypergeometric Series etc., Philosophical 
Magazine, 45 (1899), pp. 231-246. 

(2) Pearson, K., ‘On the Probable Error of Frequency Constants,’’ Biometrika, 2 (1902-3), 
pp. 273-281. 

(3) HENDERSON, R., ‘‘Frequency Curves and Moments,’’ Transactions of Acturial Society 
of America,’’ 8 (1904), pp. 30-42. 

(4) Pearson, K., ‘‘Note on the Significant or Non-significant Character of a Sub-sample 
Drawn from a Sample.’’ Biometrika, 5 (1906), pp. 181-183. 

(5) ‘“Student,’’ ‘“The Probable Error of a Mean,’’ Biometrika, 6 (1908) pp. 1-25. 

(6) Pearson, K., ‘‘On the Probable Errors of Frequency Constants.’’ Biometrika, 9 
(1913), pp. 1-10. 

(7) Isseruis, L., On the Conditions Under Which the Probable Error of Frequency Dis- 
tributions Have Real Significance.’’ Royal Society Proceedings, 92A (1915), 
pp. 23-41. 

(8) IsseRuis, L., ‘‘On the Value of a Mean as Calculated From a Sample,’’ Journal of 
Royal Statistical Society, 81 (1918), pp. 75-81. 

(9) EpGewort, F., ‘On the Value of a Mean as Calculated from a Sample,’’ Journal of 
Royal Statistical Society, 81 (1918), pp. 624-632. 

(10) TcHouprorr, A., ““On the Mathematical Expectation of a Positive Integral Power of 
the Difference Between the Frequency and the Probability of an Event.’’ Pro- 
ceedings of the Petrograd Polytechnic Institute, 

(11) TcHouprorr, A., ‘‘Zur Theorie der Stabilitat Statisticher Reirhen,’’ Skandinavisk 
Aktuarietidskrift, (1918). 

(12) TcHouprorr, A., ‘‘On the Mathematical Expectation of the Moments of Frequency 
Distributions.’”’ Part I Biometrika, 12 (1918), pp. 140-169, 185-210. Part II, 
Biometrika 13 (1920-21), pp. 283-295. 

(13) Pearson, K., ‘‘Peccavimus.”’ Biometrika, 12 (1918-19), pp. 259-281. 

(14) SpLawa-NEYMAN, J., “La Revue Mensuelle de Statistique.’ Tome 6 (1923) pp. 1-29. 

(15) Tcnouprorr, A., ‘On the Mathematical Expectation of the Moments of Frequency 
Distributions in the Case of Correlated Observations,’’ Metron 2 (1923), pp. 
461-493 ; 646-683. 

(16) Pearson, K., ‘‘Note on Professor Romanovsky’s Generalization of my Frequency 
Curves,’’ Biometrika, 16 (1924), pp. 116-117. 

(17) Cuurcn, A., ‘‘On the Moments of the Distributions of Squared Deviations for Samples 
of N Drawn from an Indefinitely Large Population,’’ Biometrika, 17 (1925), 
pp. 79-83. 

(18) Neyman, J., ‘Contributions to the Theory of Small Samples Drawn From a Finite 
Population,’’ Biometrika, 17 (1925), pp. 472-479. 

(19) CuHurcu, A., ‘On the Means and Squared Deviations of Small Samples From Any 
Population,’’ Biometrika, 18 (1926), pp. 321-394. 

(20) GREENWOOD, M., AND IssERLis, L., ‘‘A Historical Note on the Problems of Small 
Samples,’’ Royal Statistical Society Journal, 90 (1927), pp. 347-352. 

(21) Pearson, K., ‘‘Another Historical Note on the Problem of Small Samples,’’ Bio- 
metrika, 19 (1927), pp. 207-210. 

(22) Craia, C. C., ‘An Application of Thiele’s Semi-invariants to the Sampling Problem,”’ 
Metron, 7 (1928-29), pp. 3-74. 

(23) Fisner, R. A., ‘““Moments and Product Moments of Sampling Distributions,’’ Pro- 
ceedings London Mathematical Society, 2 (30) (1929), pp. 199-238. 

(24) Pepper, J., ‘Studies in the Theory of Sampling,’’ Biometrika, 21 (1929), pp. 231-258. 

(25) Carver, H. C., “Fundamentals of the Theory of Sampling,’’ Annals of Mathematical 
Statistics, 1 (1930), pp. 101-121; 260-274. 





132 PAUL S. DWYER 


(26) O’Toots, A. L., “On Symmetric Functions and Symmetric Functions of Symmetric 
Functions,’’ Annals of Mathematical Statistics, 2 (1931), pp. 102-149. 

(27) Fisuer, R. A., anp WisHaArT, J., ‘‘The Derivation of the Pattern Formulae of Two 
Way Partitions From Those of Simpler Patterns,’’ Proceedings London Mathe- 
matical Society, 2-33 (1931), pp. 195-208. 

(28) Sr. Greorcescu, N., “Further Contributions to the Sampling Problem.’’ Bio- 
metrika, 24 (1932), pp. 65-107. 

(29) WisHart, J., “‘A Comparison of the Semi-invariants of the Distributions of the 
Moment and the Semi-invariant Estimates in Sampling From an Infinite Popula- 
tion,’’ Biometrika, 25 (1933), pp. 52-60. 

(30) Fetpman, H., ‘‘Mathematical Expectations of Product Moments of Samples Drawn 
from a Set of Infinite Populations.’’ Annals of Mathematical Statistics, 6 (1935), 
pp. 30-52. 

(31) Dwyer, P. S., ‘“Moments of Any Rational Integral Isobaric Sample Moment Func- 
tion,’’ Annals of Mathematical Statistics, 8 (1937), pp. 21-65. 


BOOKS 


A. Wuitworts, ‘‘Choice and Chance.’’ 4th Edition (1886). 

B. Turete, T. N., ‘““Theory of Observations,’’ (1903). 

B’. Reprinted Annals of Mathematical Statistics, 2 (1931), pp. 165-306. 

C. Mortara, ‘‘Elementi di Statistica,’’ Roma (1917). 

D. Retz, H. L., (Editor-in-chief), ‘(Handbook of Mathematical Statistics’’ 1924. 
E. Rretz, H. L., ‘‘“Mathematical Statistics,’’ (1927). 





DISTRIBUTIONS OF SUMS OF SQUARES OF RANK DIFFERENCES 
FOR SMALL NUMBERS OF INDIVIDUALS' 


By E. G. OLps 
I. INTRODUCTION 


In a recent article,’ reporting the results of research under a grant-in-aid from 
the Carnegie Corporation of New York, Hotelling and Pabst have given a 
comprehensive treatment of the theory and application of rank correlation and 
have contributed significantly to existing knowledge on the subject. It is not 
the purpose of this note to evaluate their contribution but to attempt the 
solution of a problem they suggest. 

In §3° they have given the well-known formula for rank correlation, 7’ = 1 — 

xd . 


——; where n is the number of individuals ranked and Sd? = > d? (d; being 
ni —n 


i=l 

the rank difference for the 7th individual). In §5 the question of the significance 
of r’ in small samples has been considered from the following point of view; if the 
value of 7’, obtained from a comparison of the ranks of n individuals as a possible 
measure of the relation between two attributes, is such that there exists a high 
probability that it could have occurred by virtue of a chance rearrangement of 
the n individuals, then the value of r’ does not furnish a significant indication of 
relationship. Then one test of the significance of a particular value of r’ is to 
note whether it has a probability less than P (P equal to .01 or, less stringently, 
equal to .05) of occurring because of a chance re-ranking. 

To apply this test it is necessary to have some information regarding the 
distribution of r’ for the chance rearrangements of the numbers from 1 to n. 
Hotelling and Pabst have given the distribution of r’ for the cases, n = 2, 3, 4. 
They have noted that the distribution is symmetrical for each value of n and 
that it has a range from —1 to 1. From a consideration of the probabilities 
corresponding to d’ = 0, 2, 4, 6, they have discussed the significance of values 
of r’ for n = 5,6, 7. In §8 they have stated, “Another problem is to find con- 
venient and accurate approximations to the distribution of r’, for moderate 
values of n, with close limits of error. A table calculated along the lines sug- 
gested in §5 would be very useful.’”’ This statement, along with the interest 
manifested by others in private communications, has led to the investigation 
reported in this paper. 


1 Presented to the American Mathematical Society, December 29, 1936. 

2 Harold Hotelling and Margaret Pabst, Rank Correlation and Tests of Significance 
Involving No Assumption of Normality, Annals of Mathematical Statistics, Vol. VII, 
1936, pp. 29-43. 

3 Loc. cit. 








E. G. OLDS 












Il. 





EXACT DISTRIBUTION OF SUMS OF SQUARED DIFFERENCES 


In the paper mentioned above, the authors have given the exact probabilities 
for all possible values of r’ for n = 2, 3, and 4. Since 7’ is a linear function of 
Sd" for any particular value of n, there is a one-to-one correspondence between 
values of Sd” and values of r’. For example, for the case of n = 3, we have the 
following: 






Id = 0 


bo 


II 


6 8 













3! 


p 








a7 8 


where p represents the relative frequency of r’ or of =d°. Therefore it seems 
pertinent to investigate the distribution of Sd” for various values of n. 

If n individuals are ranked 1, 2, 3, --- n, by one criterion and then are re- 
ranked at random there are n! possibilities for the new ranking. Let us consider 
the differences between the numbers in the new and in the original rankings. 














Suppose these differences are represented by d; , dz, ---d,. Then it is apparent 
n 2 y 

that > a = 0. If we let a,, a, --- a represent an arrangement for n = k, 
t=1 


insert k + 1 after a, and advance the cycle one position at a time, we have the 
following arrangements for the case, n = k + 1: 






ay . dg : 3 , . . : a. : k+1 












ay s 3 : a4, . . . k+ 1, 






at ; k ot. 1, a, ° . : Ap-2 
k “+ i, ay ’ a2, . : . Ar-1 


Now, for n = k, d, = a, — 1, dz = a — 2,---dk = a, — k. If we list the 
differences for the k + 1 derived arrangements, we have 


dy dy , ds — * = * SS , 0 
et « OH MOGs ss 1 d,—k 
ds; + 2 dy +2, ds; + 2, 


’ QAe-1 










’ ay; 




















dg+1—k 





(2) 








d, — 2, . . . . dy--1 — 2 





@=-i, «+ + : & — | 


SUMS OF SQUARES OF RANK DIFFERENCES 135 


It is apparent that each row of differences is formed as follows: the entry in the 
first column is formed by adding 1 to the entry in column two in the row above, 
the entry in the second column is obtained by adding 1 to the entry in the third 
column in the row above, and so on until we come to the entry in the last column 
which is obtained by subtracting k from the entry in the first column of the 
preceding row. 

If we form the sum of squares of the entries in each row we observe an interest- 
ing property of the set; the sums are all congruent, modulus (k + 1). Let us 
write the sums, denoting them by S, , Se, --- Sii:. Also let d;; represent the 
entry in the 7th row and jth column. Then 


k+1 

2 
2, di. 
j=l 


k+1 k+1 


D diy; = d (d;,;+1)°+ (di —k)’ 


= (di; + 1° + dia — kb)’ — (dia + 1° 
k+1 
2 (dij + 2di5 + 1) — Qdia — b+ DE + 1) 
8:+0+ &+1) — 2d. —k+D(k+ 1) 
Si + & — 2di)(k + 1) 
Noticing that dj, = d, + 7 — 1, forz = 1, 2, --- k, and dkyia = k, we have 
= Si + (k — 2d)(k + 1) 
Se + (k — 2d, — 2)(K + 1) 
Ss; + (k — 2d; — 4)(k + 1) 


Skit = S; + (k — 2d, — 2k + 2)(k + 1) 

Serio = Sear + (kK — 2k)(K +1) = Seyi — K(k 4+ 1) 

cs 
Of course, S;,42 = S; , as the (k + 2)nd row is identical with the first and the set is 
closed. So we may write 


Seu = Si + k(k + 1) (5) 


The analysis given above not only establishes the congruence of the sums, 
modulus (k + 1), but also indicates a method of deriving the sums for n = k + 1 
k 


from the sums for n = k, since S; = >, d?. It is also worth noticing that Sj. 
i=! 
depends not only on S; (and therefore on S,) but also on d;,, (and therefore 


on d,_1). 








136 E. G. OLDS 












Another matter needs attention. It is the relation between the sums of 
squares of deviations for a particular order and for the reverse order. Let 
a;, @2,---a, be a particular arrangement. Then the reverse order is a,, 
Gn1, ++: @,. The sums of the squares of the deviates are, respectively, 


oma =-1 +@ —3) +... 
(a, — 1)? + (aa — 2)? +--- 















S 
and S 
Then 
S+S8 


(a, — k)’ 


(a; _ k)° (6) 




















[(a. — 1)° + (a — &)) + [@ — 2)? + —k +1)’ 
- (a, — kh)” + (a — 1)’ 





DY la, - + @—k +r 


> [((a, —7r) + (@, —k +r —1)P -20 @— Ne —k+r—D) 


he 


-46+) Da+G+yD1-2d8 


r=1 
k & k 

+ 2b-+1) 2 a, — Ak+1) Ur+2D 7. 
r=1 r=] r=1 

Noting that Ya? = Yr’ and La, = >r, we readily obtain the result’ 


k—k 


(7) 


—k 


It is now apparent that the sums range from 0 to Mon * with a mean of ~ Et. 





As the exact frequencies for sums of squares do not seem to be available, it 
seems useful to compute them for certain small values of n and, at the same time 





‘The geometric representation of the problem may be of some interest. 
ordinates of point R, in Euclidean n-space be (1, 2, 3, --- 
n—1, --- 2,1), and the coordinates of P be (x1, 22, «++ 2n). 
numbers (1, 2, 3, 


Let the co- 
n), the coordinates of R be (n, 

Let us restrict the z’s to be the 
- n), but not necessarily in the order given, i.e. the loctfs of P is a set of 











n! points, corresponding to the permutations of the numbers 1, 2, 3, n. Then it is easy 
n” 7 n 
to see that - 4= — a and that points P lie on an n-flat or hyperplane. Also > 23 = 
i=1 i=l 
n(n +1) (2n4+1 : s . sa , 
meus ae so points 7 lie on a hypersphere with center at the origin. Let us consider 


the joins PR and PR. 


rie is readily established that they are orthogonal. 
— 
(PR)? = (RR)? = ™ 


Then (PR)? + 
2 i and $= Yo (w— n+ i- 08 S+58 








- ~ or, since S = 


n? — n 


—— result previously established otherwise. 







lar 


SUMS OF SQUARES OF RANK DIFFERENCES 137 


to devise a method which can be used successfully to extend the computation to 
larger values of nif desired. The details of the method follow. 

Let D, represent any series of n differences, d; , d2, --- d,, and let O, be an 
operator such that O, operating on D, (written O,(D,)) means that D, = 
(d; ,d2, --- d,) is changed to (d2 + 1,d3 +1---d,+1,d;— (n—1)). Letm, 
written following d; , dz, --- d, , indicate that Sd’ = m. Forn = 3 


D3, = 0, 0): 

O;(D31) = D332 = (1, 1, —2):6 

O3(D3,2) = Ds,3 = (2, —1, —1): 

ke — k 
3 


have S + S = 8, so sums of 0 and 6 indicate corresponding sums of 8 and 2 
when the order of the elements is reversed. Thus we have, for n = 3. 


But we have shown that S + S = forn =k. Therefore, for n = 3, we 


Sums 


Frequencies | 1 





For n = 4 we have 
Dana = (0, 


Dion = (I, 0) 
Dasa _ (2, —1, —1, 0) 


where these are obtained from D3, , D32 and D3,3 respectively by inserting a 
zero as a fourth difference. We operate on each of these four times with 0,. 
For example, 


Dae (1, 0): 6 
Os(Da21) = Daze = (2, —2):10 
O(Di22) = Dios = (0, at: 6 
O(Di23) = Dios = (38, 0 —3):18 
O(D424) = Daoa = (1, -2, 0) 


As a check on computation, we notice, first, that the set is closed by the re- 
appearance of D42,1 ; and, second, that 10, 6, 18 and 6 are congruent, modulus 4. 
In like fashion, one of the sets for n = 5, is the following; 


Ds24. = (3, 0, 0, —3, 0):18 
Ds 2,42 —_ (1, 1, —2, 1, —1): 8 
Dsaa.s = (2,-1, 2, 0, —3):18 





138 E. G. OLDS 


Dsaaa = (0, 3, 1, —2, —2):18 
Ds2as = (4, 2, —1, —1, —4):38 
Ds241 = (8, 0, 0, —3, 0) 


Of course the sums for n = 5 can be obtained from those for n = 4 by making 
use of (4). For D424 = (3, 0, 0, —3) 718 we have S; = 18, k= 4, d, = 3, ds = 0, 
d3 = 0, d; = —3. Then 


S, = 

in Kh +O ~~ 80) «8 

Lu + - 34-230) = B 
Lnh+@+-24- 6@) «8 
inh} 44-800 =o ® 
Lun & ~~ 4-5 = 


However, results obtained by this latter method do not help with the case of 
n = 6. If we desire to obtain results for n = 6 we will need to exhibit the 
complete sets of differences for n = 5 as we did by the former method. 

An alternative method for obtaining frequencies of sums of squares is of some 
interest. It will be illustrated for n = 4. Let us consider the square array 


‘ay b; Cy d; 
(lo be C9 le 
a3 bs C3 ds 
a4 bs C4 dy 


If we form all possible products a;b;c.di(t, 7, ky l = 1, 2,3,4;7 ¥ 7 #¥k XD, 
the subscripts give the 4! permutations of 1, 2,3,4. Now let us form a new array 


‘Ag b Co ds 
a_yj bo Cj de 
a_2 b_y Co dy 


a_3 b_e C4 do) 


where subscripts in each column represent the vertical distance of the term 
above the principal diagonal. Since the original terms had subscripts giving all 
possible arrangements of 1, 2, 3, 4, terms formed in a similar fashion from the 
new array will give all possible arrangements of the differences. Now form a 
third array 


o 


4 9 





l 





~ 


0 






x 
x 
x 
a 


& RRA 


~ 


1 





where the exponent of 7 1s 





the square of the corresponding subscript in the 





Xd 





ng 


of 


ne 


le 


























SUMS OF SQUARES OF RANK DIFFERENCES 139 
TABLE I 
Frequencies of sums of squares of rank differences 
sa? —_N 2 3 4 5 6 7 
-. SF 8 1 1 1 1 
* 
9 . | 3 4 5 6 
4 | *() 1 3 6 10 
a; 2 4 | 6 9 14 
8 1 2 7 16 29 
10 | "2 6 12 26 
12 2 | 4 14 35 
14 4 10 24 46 
16 ] 6 20 55 
18 3 10 21 54 
20 1 *6 23 74 
22 10 28 70 
24 | 6 24 84 
26 10 34 90 
28 4 20 78 
30 6 32 90 
32 7 42 129 
34 6 29 | = 106 
—_——— — — —_——_— — _ — - * - 
36 3 29 123 
38 4 42 134 
40 1 32 147 
42 | 20 | 98 
44 34 168 
46 24 130 
48 28 175 
50 23 144 
52 | 21 168 
54 20 144 
56 | *184 











24 | 





yond the limits of the table but may easily be obtained by symmetry. 


*The asterisk shows the location of the mean. The frequencies for n = 6, 7 extend be- 








140 E. G. OLDS 








second array. It is easy to see that, if terms are formed from the new array 
by the same method as before, our terms are powers of « where the exponents 
represent sums of squares of differences. If we now define the array to be equal 
to the sum of the terms formed from the array, then 





0 1 9 





~ 


0 


— 


4 
1) = kia’ + hea? +--+ hea? +... kor® + kya, 


0 


oe 
- 
° 


© 
ae 


& 8 88 
8. 8 88 
8 8 8&8 


and the k’s give the desired frequencies for sums of squares corresponding to 
exponents of z. For example Sd° = 0 occurs k; times, Sd” = 2 occurs ke times, 
ete. 

It can be readily verified that, for n < 5, the array can be expanded as a 
determinant and the values of the k’s can be obtained by taking the absolute 
values of the coefficients in the expansion. Also, considering the arrays as 
determinants, their values for n = 2, 3, 4 are, respectively, (1 — 2”), (1 — 2’) 
(dQ — 2‘), — 2) (1 — 2‘)? (1 — 2°). If it were possible to obtain a general 
form of this type it might be possible to greatly reduce the labor which is in- 
volved in expanding the arrays. At present, however, this method of attack 
does not seem feasible on account of the lack of adequate sub-checks, the amount 
of work involved, and its inappropriateness for use by inexperienced clerical help. 

Hotelling and Pabst’ have given exact results in terms of n for the cases 
xd’ = 0,2, 4,6. It is certainly possible to follow their method to obtain general 
results for =d” larger than 6, but, as they suggest, the work becomes very labo- 


rious. For =d° = 8 we need the sets of possible integral values for 2; , 22, «++ Zn, 

under the following conditions: (a) Du = 0, (b) > u= 8, (ec) 1+24,,2+ %, 
i=1 t=1 

3+43,4 + %,---n+ 2, are the numbers 1, 2, 3, --- n, (but not necessarily 


in that order). 
Possible solutions are: 


(a) 2-2 = 2,414 = 0,2; = —2 (i = 3,4, --- n) and the other 2’s zero, 

(b) te-2 = 2,24 = —1,% = —1, %1 = 1, % = —1 (a = 5, 6, ---n3 b = 3, 
4,---a — 2), 

(c) t+ = 1,2 = —1, %o-2 = 2, 2%. = —1,2%, = —1 (a = 5,6, --- n3b = 2, 3, 
-+- a — 8), 

(d) m2 = —2, m1 = 1, % = 1,%-1 = 1, 2. = —1 (a = 5, 6, --- n3b = 3, 4, 

-a — 2), 

(e) m1 = 1,2 = —1,%e-2 = —2, %-1 = 1,2. = 1 (2 = 5,6, --- 23d = 2, 8, 
-- a — 3), 

(f) Yan = Xe = BM = Mo = 1;%e = 2 = BD = Xe = —1 (a = 8,9, --- 0; 


b=6,7,---a—2;c=4,5,---b—2;d =2,3, --- c — 2) 
Frequencies for each of these types must be considered separately. The 


5 Loc. cit. p. 35. 












































me 
po 
ob 


su 


SUMS OF SQUARES OF RANK DIFFERENCES 141 


method of evaluation will be illustrated for type (f), since this type yields the 


polynomial of highest degree. It is apparent that the required frequency is 
a—2 b—2 


n c—2 
obtained by computing =. (& (= ( 1))), It can be verified that the re- 
8 4 2 


6 
sult is 
(n—4) (a — 4)(n — Sn — O(n — 7) 
4! ” 24 





The total of (a), (b), (c), (d), (e), and (f) is 


(n — 2 + An — 3) = 


For =d° = 10, the result seems to be 


(n ef 5 


An — 3) + (n— 3)? 4+ (n—4)% + ‘i 


For sums greater than 8 the method becomes quite uninviting, not only 
because of the intricacy of the necessary analysis, but also because of the 
opportunities for mechanical errors and the absence of satisfactory checks. 
Besides, if the exact distribution for a particular value of n is desired, we need 

3 
expressions for Sd” = 0, 2, 4, --- ~ = "92. For nas small as 8, this means 
the requirement of 42 formulas. It is fairly evident that these formulas will 
comprise polynomials ranging in degree from 0 to 41. 


III, APPROXIMATIONS 


Since the exact distributions of sums of squares are not easily obtained, we 
next consider the problem of finding approximations for them. Hotelling and 
Pabst® have given a method of deriving the even moments of the distribution of 
r’, (the odd moments being zero), and have recorded the values of the second and 
fourth moments. They have also remarked that the kurtosis, B2 = ps/ us , 
approaches 3 and that the distribution of r’ approaches normality as n ap- 
proaches infinity. These are valuable and interesting results. Because of 
them the normal curve suggests itself as an approximating function. Its use 
has been considered a little later in this investigation. 

But a distribution with a finite range causes trouble at the tails when a normal 
fit is attempted, and, for this problem, we are particularly interested in the tails. 
It seems more feasible to attempt an approximation with the Pearson type II 


> 


curve, y = w(1 _ sj . This has the advantage of a finite range and three 


® Loc. cit. p. 32 et seq. 








142 E. G. OLDS 


constants to be determined. The values of these constants, as given by Elder- 
ton’ are 





Spe — 9 2 22 Be, N XT (2m + 2) 
ni = —_—_ ’ Gos ’ = 
2(3 — pa) 3—e axe x irate ® 
(where N is the total frequency). 


If we use this distribution to approximate the distribution of sums of squares, 


it proves convenient to define z as equal to one-half the deviation of Ed’ from its 
mean, i.e., 











ea wt n—n 
12 















2 





Then the relative frequency of =d° = k is approximated by 





8 


rst} k n® = 
dz = ' yhere z, = ~ — Come 
I f(x) dx = f(z.) where 2, = 5 3 
(Of course, closer approximations may be obtained, if desired). The approxi- 
mation used is clear if we remember that only even values of k are possible and 
3 


that the range is now ~—_” 











6 
The moments for x are now obtained from the moments for r’ by multiplying 
3 
by the proper powers of “—". _ We have 
, 1) f 
mt) =n — 9 ED] 


The value of B is unchanged. For 7’ or z it is 


_ 3(25n* — 13n* — 73n* + 37n + 72) 


Bs 25n(n + 1)?(n = 1) 












For n = 5, ue = 25, Be = 2.0720, N = 5!. Using these values and equations 
(8), we obtain a = 10.566, m = .73276, yo = 7.8545. The approximating 


2 - 73276 

function is y = 7.8645( 1 — a In table II the computed values of y 
and the true frequencies are listed for comparison. 

When testing the significance of a particular value of Sd’ our principal interest 
is in the probability that 2d’ < k, rather than in the probability that =d° = k. 
The probability that =d’< k requires cumulation of frequencies, followed by 
division by the total frequency. If results, given in table II, are compared it is 
noticed that the maximum error in using the type II function is .0194 and the 
average error is .0072. Comparisons for other values of n are given in table III. 








7Elderton, W. P., Frequency Curves and Correlation, Layton, London, 2nd ed., 1927, 
p. 84. 










TABLE II 
Comparison of exact and approximate frequencies forn = 5 


(Approximations obtained by computing ordinates of 


7 as (1 x? - 
aides ~ 711.64 


| Cumulative (expressed 
as percent of 120) 








Frequencies 


Difference of cumulatives 





| Approxi- | Approxi- 
mate Exact mate 





.50 .0083 .0125 .0042 
.04 .0417 .0378 .0039 
21 .0667 .0729 .0062 
.14 .1167 .1137 .0010 
91 . 1750 . 1650 .0100 
.52 . 2250 .2193 | .0057 
Ol | .2583 | .2777 | .0194 
.39 .3417 .3393 | .0024 
.65 .3917 | .4031 .0114 
80 | .4750 | .4681 | .0069 
85 | .5250 | .5335 | — .0085 
| | average of abso- 
lute values = .0072 











NINN SISO Oro PR We 














TABLE III 


_ Approximating functions, ‘with errors involved 





|Av erage and maximum sheclute values 


Approximating functions | of differences of cumulatives 





Exact— | Exact— | Type II— 


Type II Normal typeII | normal | normal 
e \ 774 - | | 


| | | 
| .0072 .0194| .0200 .0415| .0210 .0357 
nis) 


5 rae 


1.3715 | 
6 31.652(1— ———— ¢ 1225 | 0030 .0126| .0131 .0273| .0136 .0270 
351.75 | 


2.0160 ! eT | | 
7 /156.33 | 1 — a) | 33 | 0017 .0067 | .0106 .0221 | .0108 .0209 


—_ ——s 
2 2.6635) 


B~ pa ———e a | 0086 0175 


a2 \3 3140 
9 162 1- 
6 oe (, 4332.6 2" 








8 72 








2 \ 8.9655 _ 
——— 


10 | 64515 
8266.6 A 4/1512. be 
































144 E. G. OLDS 






It would be very convenient if the cumulative frequencies could be approxi- 
mated by the use of normal curves. In table III are listed the proper normal 
curves, along with comparisons with the exact values and with the values 
obtained from the type II curves. For the values of n investigated the normal 
curve is not as satisfactory as the type II. This, of course, is to be expected 
because of the lack of agreement between the fourth moment of the normal curve 
and of the exact distribution.» However, in view of the fact that, for values of n 
investigated, the maximum and average errors decrease as n increases, it seems 
satisfactory to sacrifice accuracy to expedience and use the normal curve as an 
approximating function for cases of n greater than 10. This has been done in 
constructing table V. In further justification it might be noted that B: , which 


approaches 3 as n approaches infinity is an increasing function of n for n greater 
than 3. 


IV. TABLES TO TEST THE SIGNIFICANCE OF THE RANK CORRELATION COEFFICIENT, 
WITH EXAMPLES OF THEIR USE 


Table IV gives the probability that, for any given value of n and a computed 
value of Ed” less than or equal to the mean, the value will not be exceeded by 
chance. For a value of Sd’ greater than or equal to the mean, it gives the 
probability that the value will be equalled or exceeded. The values for n = 
2, 3, 4, 5, 6, 7 are computed from exact frequencies; those for n = 8, 9, 10 are 
computed from type II curves. 

Table V is constructed by the use of normal curves. It gives the limits of 
>a’ for a few of the more useful probabilities. 

It seems desirable to explain why values of 2d’ were tabled rather than values 
of r’. It was done for two reasons: first, to avoid the difficulties arising from 
discrete variates; and, second, because the tables seem more useful in the form 
given since the labor of completing the calculation of r’ can be avoided if the 
computed value of =d’ tests as not significant. 

Example 1. Seven individuals are ranked by two criteria, as indicated below. 
Are the results significantly alike? 


A 3 
B 6 
@i-— £3 «£2 1 «—#§ j/ @ 
@.1 1 9 1 1 1 4/7 18 


6 


7 











- 


7 5 






Solution: Rows 3 and 4 give the differences and squared differences, respectively. 
If we enter table IV with n = 7 and Xd’ = 18, we find P = .0548, so we would 
expect that a value as small as 18 would occur by chance more than 5% of the 
time. This does not usually indicate significance so it is useless to compute the 
value of r’. It is interesting to notice that r’ actually does prove to be equal to 

2 
.68 and that, if we had used the formula, o,, = 1.0471 (455) we might have 




















SUMS OF SQUARES OF RANK DIFFERENCES 


TABLE IV 
The probability that Xd? > S for S > = wy, or that Xd* < S for S <zy (where = y 


represents mean value of sum of squares) 


~ 


0 





| .5000, .1667| .0417, .0083 | .0014 | .0002 | .0003 | .0001 | .0000 
| .5000, .5000) .1667) .0417 | .0083 | .0014 | .0006 | .0002 | .0001 
| .5000, .2083| .0667 | .0167 | .0034 | .0011 | .0003 
.5000, .3750, .1167 | .0292 | .0062 | .0018 | .0005 | . 
| .1667) 4583, .1750 | .0514 | .0119 | .0028 | .0007 | . 
| | .5417, .2250 | .0681 | .0171 | .0042 | .0010 | . 
4583, .2583 | .0875 | .0240 | .0059 | .0015 | . 
| .3750| .3417 | .1208 | .0331 | .0081 | .0020 | . 
| 2083, .3917 | .1486 | .0440 | .0108 | .0027 | 





| .1667| .4750 | .1778 | | 0141 | .0035 | 
.0417) .5250 | .2097 | | .0179 | .0045 
.4750 | .2486 | .0833 | .0224 | .0057 | . 
.3917 | .2819 | .1000 | .0275 | .0071 | 
.3417 | .3292 | .1179 | .0331 | .0087 
| .2583 | .3569 | .1333 | .0396 | .0106 | . 
.2250 | .4014 | .1512 | .0469 | .0127 
.1750 | .4597 | .1768 | .0550 | .0152 | . 
.1167 | .5000 | .1978 | .0639 | .0179 | . 
.0667 | .5000 | .2222 | .0736 | .0210| . 
0417 | .4597 | .2488 | .0841 | .0244 | 
| 0083 | .4014 | .2780 | .0956 | .0281 | . 


| 3569 | .2974 | .1078 | .0323 | . 

.3292 | .3308 | .1207 | .0368 | . 
| .2819 | .3565 | .1345 | .0417 | . 
| 2486 | .3913 | .1491 | .0470 | . 
| .2097 | .4198 | .1645 | .0528 | . 
| .1778 | .4532 | .1806 | .0589 | . 
| .1486 | .4817 | .1974 | .0656 | . 
| .1208 | .5183 | .2150 | .0726 | . 

0875 | .4817 | .2332 | .0802 | . 





| 














E. G. OLDS 


TABLE [V—Continued 

















.0514 


. 2520 
.2715 
.2915 










.3120 








































































. 1645 


68 .0083 | .3308 .3330 1248 .0394 
70 0014 | = .2974 3544 .1351 0432 
72 2780 .3761 1459 .0472 
74 2488 3982 1571 0515 
76 2222 4205 1688 .0561 
7 | | .1978 | .4431 1809 0609 
80 | .1768 4657 1935 0659 
82 1512 4885 2065 .0713 
84 1333 5113 2198 .0769 
86 .1179 4885 .2336 .0828 
88 | 1000 4657 2477 .0889 
90 | | 0833 4431 2622 0954 
92 | 0694 4205 .2770 1021 
4 0548 .3982 2922 1091 
6 | 0440 3761 3077 1164 
98 .0331 3544 3234 1239 
100 0240 .3330 .3394 .1318 
102 0171 .3120 3557 1399 
104 0119 2915 3721 .1483 
106 0062 2715 3888 .1570 
108 .0034 2520 4056 1659 
110 0014 2332 4226 1751 
112 0002 2150 .4397 1846 
114 .1974 4568 1944 
116 .1806 4741 2044 









. 1491 
. 1345 


. 1207 









SUMS OF SQUARES OF RANK DIFFERENCES 


TABLE IV—Concluded 


























1078 | .4568 | .2580 
| .0956 | .4397 | .2694 
| .0841 | .4226 | .2810 
.0736 | .4056 | .2928 
.0639 | .3888 | .3048 
.0550 | .3721 | .3169 











0469 | .3557 | .3293 
| 


.0396 | .3394 | .3418 
0331 | | 13545 


.3673 











.0275 | 
0224 |. | .3802 
0179 |. | .3932 


0141 | .2622 | .4063 
0108 | .2477. | —.4196 
0081 | .2336 | .4328 


0059 | .2198 | .4462 
0042 | .2065 ~~ «.4596 
0028 | .1985 | .4731 210 


0018 | .1809  .4865 212° 
0011 | 16885000 214 
166 | .0006 | .1571 | .5000 216 | 




















(Tables for cases 9 and 10 can be completed by symmetry. ) 


judged the value of r’ significant, since o,, = .213, and .213 is less than one- 
third of .68. 

Example 2. Six golfers found, upon ranking their scores and also ranking 
their respective amounts of sleep for the previous night, that the two orders 
were the reverse of one another except that the two ranking 1, 2 in sleep ranked 5, 
6 in score. Is the negative correlation too great to be reasonably attributed to 
chance? 

Solution: We find =d*° = 68 and, upon consulting table IV, P = .0083, so we 
conclude that more sleep might mean fewer strokes. 

Example 3. Before an examination a teacher ranked his class of 13 members. 





148 E. G. OLDS 

After the examination he found that the sum of the squares of the deviations of 
rank on examination from rank estimated was 144. Should he consider the 
agreement satisfactory? 


TABLE V 


Pairs of values between which 2d? has a probability, P, of being included 





N| 


P = 99 


-96 


.90 





11 
12 | 
13 
14 | 
15 | 
16 
17 | 
18 | 


19 
20 
21 
22 | 


40.8 

60.9 

93.3 
125.9 


174.5 

227.8 
290.5 
363 .6 


399 .2 
505.1 
634.7 | 
780.1 


58. 
82. 
119. 
161. 


211. 
271. 
341. 
422. 


945.5 | 
1132.2 
1341.4 


1574.4 


| 1088.4 
| 1290.6 | 
| 1515.7 


381.8 
483.6 
608.4 
748.6 


908.2 





252.6 


wl 
105.9 

148.2 
195.8 | 


319.4 
397.0 | 
486 .3 


105.6 
141.2 
191.2 

247 4 


313.8 
391.2 
480 .4 
582.4 


334.4 
424.8 
536.8 
662.6 
806.2 
968.8 
1151.6 
1355.6 


554.6 
667.8 
—| 


1270.2 





447.9 

544.1 
653.0 
775.5 


| 2766.5 


1832.1 
2115.9 
2427 .0 


514. 
620. 





872. 


738 .§ 


| 1765.1 


2039.8 


| 2341.1 
| 2670.0 


588 .2 
703.4 
832.8 
977.3 








23 


24 
25 | 


26 | 


27 | 
28 | 
29 | 
30 


912.5 
1064.7 
1233.0 
1418.2 


3135.5 
3535.3 | 
3967 .0 
4431.8 


1020. 


1184. 
1365. 
1564. 


| 2027.8 
| 3415.7 
| 3834.6 
| 4285.9 


1137.8 

1315.1 
1510.1 
1723.6 


1691.8 


1956.6 | 
2247 .2 
2564 .7 


| 2910.2 

| 3284.9 
| 3689.9 
| 4126.4 


1135.3 | 


698 .0 
828.1 
973.6 


1314.2 
1511.1 
1727.0 





1962.7 





1621.1 
1842.7 
2083 .7 





2345 .0 


6645.0 


4930.9 
5465.3 | 
6036.3 


1781. 
2018. 
2275. 
2553. 


| 


4770.5 
5289.9 
5744.9 


6436.8 | 


1956 .5 
2209.8 
2484.3 
2780.8 


4595.5 
5098 . 2 
5635.7 
6209 .2 


2219.2 


2497 .3 
2797.9 | 
3122.0 


1582.0 
1831.9 
2106.4 
2406.7 


2733.8 





3088 .9 
3473.0 
3887.3 


4332.8 
4810.7 
5322.1 


795.6 | 
939.0 


1098.7 
1275.7 | 


1471.0 | 


1685.4 
1919.8 
2175.3 
2452.6 
2752.8 
3076.7 


1484.4 
1721.0 
1981.3 
2266 .3 
2577 .0 
2914.6 
3280.2 
3674.7 
4099 .4 
4555 .2 
5043.3 


5868.0 | 3425.2 | 5564.8 


Solution: Entering table V with n = 13 we see that P = .96 for a value between 
148.2 and 579.8, and that P = .98 for a value between 119.6 and 608.4. There- 
fore the probability of not exceeding 144 by chance is between .02 and .01. It 
would seem that the teacher showed considerable knowledge of his class. 


CARNEGIE INSTITUTE OF TECHNOLOGY. 





NOTE ON CORRELATIONS 
By D. B. Dr Lury 


When the value of a correlation coefficient is to be estimated from a set of 
N pairs of observations, (x; , yi), 7 = 1, 2, --- N, the statistic ordinarily com- 
puted is, of course, the product-moment correlation coefficient, 


r = 81/(s,:82), Where 


N N N 
ni = D(a: - 2, np = x (y.—9, nsv = d (x: — (yi — 9), 
i=1 i= i= 


N N 
Ni = > xi, Nj = > vis n=N-1. 
i=1 i=1 


However, when x and y are known to have the same population mean and 
variance, the precision of the estimate may be improved slightly by using the 
intraclass correlation coefficient, 

N 
22 (xi — (yi - 


yr’ 


-, 2NE= - (2; + y). 


> (2; — 8) + (ys — 8} 


It may be of interest to inquire into the properties of an analogous coefficient, 
appropriate to the case of equal variances and different means. This coeffi- 
cient would naturally be chosen to be 


u = Qsy/(si + 82) = {281 82/(s} + s2)}r. 
Obviously, |u| < |r]. 

The probability distribution of u is easily determined, under the assumption 
that x and y obey a bivariate normal distribution. If o” is their common vari- 
ance, no restriction is introduced by taking ¢ = 1. Then the probability ele- 
ment of s; , se, Tr, is known to be’ 


n n 


(s1 82)" e€ 21-9?) 


—3 


. (1 — 7°) ? ds,dsedr, 


a(n — 2)!(1 — p?)”? 


(s3—2prs,s9+83) 


where p is the correlation of x and y. From this, the distribution of wu can 
be obtained by making the transformation 


u = {2s,se/(si + s3)}r, vy = 2s;82/(si + 83), w= s; +s). 


1R. A. Fisher, Biometrika, Vol. 10, p. 510. 
149 








150 D. B. DE LURY 





Under this transformation, the range of s; , s2 , 7, determined by the inequalities 
0<s; < ~,7 = 1,2, —1 <r < 1, is mapped in a two-fold manner upon the 
space —v S<usv,0<v<10<w< _o. For fixed wu, v ranges from u to 1 
or from —u to 1, according as wu is positive or negative, and w runs from 0 to o, 
The probability element of u, v, w, is found to be 


a(n — 2)1(1 — p?)”” V/ l—-¢ 


and the distribution of u, obtained by integrating with respect to v and w, is 


n/2)” v 00-4 —a——t-ae 
(n/2) ff = Oy 5” dudede, 






















K(1 — o)""(1 — pw)" — 8d, KK = S27, 
Vat (7) 


If p = 0, the distribution of u is identical with that of r, the product-moment 
correlation coefficient (for p = 0), in samples of (N + 1) pairs of observations. 
Therefore, to test the hypothesis of independence, using the coefficient u, the 
methods and tables appropriate to testing the same hypothesis, using the coeffi- 
cient 7, are available. The precision gained by using wu rather than r is equiva- 
lent to that supplied by another pair of observations. 

In the general case, the transformation introduced by R. A. Fisher,” 


u = tanh z, p = tanh ¢, 
leads to the distribution element? 
K sech"(z — ¢)dz. 


This distribution is invariant in form under varying ¢, and is effectively normal 
for samples of any size. In all cases, z is an unbiased estimate of ¢. 

The variance of z can be obtained by the following device. Denote by (2p, n) 
the 2 p-th moment of z about the mean, 
















ee} 


I(2p,n) = K x’? sech” «dz. 


—o 


Integration by parts gives the recurrence formula, 








2 
n 


(2p + 1)(2p + 2) 





I(2p, n) = {I(2p + 2,n) — I(2p + 2,n + 2)}, p > 0. 


2 Metron, Vol. 1, N. 4, p. 7. 
3 The distributions of u and z for n = 1 have been given by R. A. Fisher, Metron, Vol. 
1, NN. 4,-pe S 










NOTE ON CORRELATIONS 


From this follows at once the relation 


I(2p + 2,n + 2) = I(2p + 2, 1) 
— (2p + 1)(2p + 2) {HP  » _ Pein fep.n)} auth 
= I(2p + 2, 2) 


I(2p, 2 I(2p, 4 I(2p, 
— (2p + 1)(2p +2) {70 @p, 2) 4 = ae . n even. 
The values of 7(2p + 2, 1) and J(2p + 2, 2) can be found without evaluating 
the integrals, by letting n — «. It can be shown that [(2p, n) = O(n”), 
and hence lim (2p, n) = Oforp > 0. We obtain 


no 


I(2p + 2, 1) = (2p + 1)(2p + 2) {tee ~ = oe -_ Pt - 8 





2 
I(2p + 2, 2) = (2p + 1)(2p + 2) fess = > ee ¢ ey vo} 
Hence, for all values of n and p, (replacing n + 2 by n), 
I(2p + 2, n) 
= (2p, n) , I(2p,n +2) , I(2p, n + 4) 
= (2p + 1)(2p + 2) ‘ nr? — + + (n + 4)? = 


Setting p = 0 to get the variance, 
] 


‘i 
pe = I(2, n) “5 +apmtapet . 


Therefore, making use of the fact that / a*dr< Dit< / x dz, we find 
m i=m m—l 


that 
1/n < we < 1/(n — 2), 


and from the numerical values of ue for small values of n, it appears that the 
approximation pe ~ 1/(n — 1) is satisfactory in all cases. 
In the same way, it can be shown that 


3/n? < us < 3/(n — 2). 


Thus the method of transforming correlations to test for significance, used 
by R. A. Fisher in connection with both interclass and intraclass correlations, 
is available here also, and is, in fact, slightly simpler, owing to the absence of 
bias. 

The coefficient u can, of course, be used in all situations where the intraclass 
coefficient is appropriate, (when the number of observations in each class is 
two), and conceivably in a small class of other cases as well. The test of signifi- 
cance is simpler using u instead of r’, and the loss of precision is negligible. 


UNIVERSITY OF TORONTO. 








JOURNAL OF THE AMERICAN 
STATISTICAL ASSOCIATION 


Junge, 1938 - $1.50 peR Copy - $6.00 PER ANNUM - VoL. 33 - No. 202 


CONTENTS 


ARTICLES 

The Function of Deposit Banking. . . . . . .E. A. GOLDENWEISER 
Trends of Principal Earning Assets and Their Significance. . Wruii1am J. Carson 
The Economic Distribution of Demand Deposits . . . . . . LaucHLIN CURRIE 
Trends of Bank Earnings and Expenses. . . .. . . . DonatpS. THompson 
Some Problems of Bank Supervision. . . ... . . . . . . HomeER JONEs 
New Indexes of Production and Trade. . . . .. . . . Norris O. JoHNSON 
A Quarterly Series of Manufacturers’ Inventories. . . . . . . D.C. Euuiorr 
Problems in the Compilation of Data on Total Relief and Work Program Expenditures 
Dorotuy Faus Breck 

Households and Persons Receiving Relief or Assistance 
T. J. Woorter, Jr. and T. E. Wuitine 
Residual Relationships and Velocity of Change as Pitfalls in the Field of Statistical 
Forecasting . . . « « - « . Leon E. TRUESDELL 
The Reliability of Preliminary Price Indexes. . « . « « . Watrer B. GARVER 
Variations in Family Living Expenditures. . . . . . . Dorortuy 8S. Brapy 
Discrete Frequency Distributions Arising from Mixtures of Several Single Probability 


Values . . . << * Se ee ee a me, ee 
Comparability of Mortality Statistics ~ « « «© « « « « THEODORE A. JANSSEN 


NOTES 
The Association of Agricultural Price Fluctuations . . . . .WARREN C. WAITE 
Big Business, Its Growth and Its Place 
Rurus S. Tucker, ALFRED L. BERNHEIM, and MARGARET GRANT SCHNEIDER 

Remy . . . « . . . . . GARDINER C. MEANS 
Revision of the Base Period fee Government fndex Numbers ae a 
Progress of Work in the Census Bureau. . ... . . . . .JOSEPH A. HILL 
Statistical News and Notes 

Board of Governors of the Federal Reserve System. 

Federal Home Loan Bank Board. 

Bureau of Foreign and Domestic Commerce 

Federal Trade Commission . ‘ : 

Graduate School of the Department of Agricultere ‘ 

Bureau of Agricultural Economics . 

Social Security Board. re 

Bureau of Labor Statistics, U. S. Department a Labor 

United States Employment Service 

Women’s Bureau, U. 8. Department of Leber . : 

Division of Social Research, Works Progress bisiadetestion i ‘ 

Division of Research, Statistics and Records, Works Progress hdminiateotion. 

Office of Education. 


Continued on next page 


f 
: 
“t 
. 
7 
i 
Ly 
3 
r 
- 
w 
> 
a 
a 
/ 
oO 
ri 
- 





NOTES—Continued 
U. S. Public Health Service 
American Association of Schools of Social Work . 
The Brookings Institution . 
Research Projects at Dun and Reudatecst, ine. 
National Bureau of Economic Research . 
National Industrial Conference Board. 
Chapter Activities 
New Members en 4s 
BOOK REVIEWS 
Apams, ArtHuR B. Analyses of Business Cycles. Wilbert G. Fritz 
AMERICAN MEDICAL AssocIATION. Group Hospitalization. Paul A. Dodd . 
Bow.ey, A. L. Wages and Income in the United Kingdom since 1860. Simon Kuznets. 
CaMPBELL, E. G. The Reorganization of the American Railroad System, 1893-1900. 
J. P. Watson . — 
Cowan, Donatp R. G. ‘Sales Asulgeis aun he Mansgunent ‘Standpoint. R. S. 
Alexander and R. Parker Eastwood. — 
Cow es Commission. Report of Third Annual Research Centimmens on Brenemies ond 
Statistics. Orville J. McDiarmid : 
DOoNALDSON, JOHN. The Dollar. Melchior Palyi E ee Se oe 
IsHi1, Ryoicu1. Population Pressure and Economic Life in Sena. Cyrus H. Peake . 
JENSEN, E1nar. Danish Agriculture, Its Economic Development. John D. Black . 
JENKINSON, Hitary. A Manual of Archive Administration. Solon J. Buck 
KENDALL, M. G., see Yule, G. Udny. 
KiLLovucH, Huecu B. International Trade. Edward C. Welsh 
KueEIn, Puiuipe. A Social Study of Pittsburgh. Calvin F. Schmid . 
LANGLET, Otor. Studier éver Tallens Fysiologiska Variabilitet och dess Sambend sod 
Klimatet. Thaddeus Parr . - 
MAINLAND, Donatp. The Treatment of Clinton’ ond Sabevatery Data. ‘Antonio 
Ciocco . ‘ cde ee 
Martin, RosBert F. " International Rew Conmatite Price Conte. N. H. Engle 
Moptey, Rupotr. How to Use Pictorial Statistics. Earl G. Millison. 
NeymMaNn, J. Lectures and Conferences on Mathematical Statistics. S. S. Wilks 
PEARSON, FRANK A., see Warren, George F. 
Picou, A. C. Socialism versus Capitalism. A.B. Wolfe . . 
Scumip, Catvin F. Social Saga of Two Cities. Ralph C. Fletcher 
Scumipt, Emerson P. Man and Society. Verne C. Wright 
Simons, Henry C. Personal Income Taxation. Alfred G. Buehler. Seay fs 
Stamp, Sir Jostan. The National Capital and Other Statistical Studies. Cleona 
Lewis ae 
TiprettT, L. H. C. The Methods of Statistics. Frederick E. C roxton . 
WarRREN, GEORGE F. and PEARSON, Frank A. World Prices and the Building a 
try. Warren C. Scoville ‘ ‘ 
Wavuenu, ALBERT E. Elements of Statistical Method. 8. B. ‘Stocking ; 
Wiis, ParKeER B. The Federal Reserve Bank of San Francisco. George W. Dow rie. 
Yue, G. Upny, and Kenpatu, M. G. An Introduction to the Theory of Statistics. 
PSO EERO ck ks GH SH OK K SEK DD ES De Ee wT  « 4 


Address orders for subscriptions and back numbers to Frederick F. Stephan, Secretary, 
American Statistical Association, 722 Woodward Building, Washington, D. C. 





THE ANNALS 
of 
MATHEMATICAL 
STATISTICS 


Tue OFFICIAL JOURNAL OF THE INSTITUTE 
or MATHEMATICAL STATISTICS 


Contents 


Interior and Exterior Means Obtained by the Method of Moments. 
Tewamn be. DOG... ooo csgics oo ae a soi ee i es ee 


On the Chi-Square Distribution for Small Samples. 


Pas, G. TRO. scsi 3k WSS bee vtekee tee 


Shortest Average Confidence Intervals from Large Samples. 


Transformations of the Pearson Type III Distribution. 


A Test of the Significance of the Difference Between Means of 
Samples from Two Normal Populations Without Assuming 
Equal Variances. Daisy M. Srarkny.................... 201 


Some Efficient Measures of Relative Dispersion. Niuan Norris... 214 


Notes on the Distribution of the Geometric Mean. 


Note on a Formula for the Multiple Correlation Coefficient. 
BH. Mi; Bacteti a. i iso eb ihe ee 


Vol. IX, No. 3 — September, 1938 


iit Pthioth i 2 isis Lite ie 


lil 





